Hasil Pencarian

Ditemukan 33 dokumen yang sesuai dengan query

Reinforcement learning: state-of-the-art

"Reinforcement learning encompasses both a science of adaptive behavior of rational beings in uncertain environments and a computational methodology for finding optimal behaviors for challenging problems in control, optimization and adaptive behavior of intelligent agents. As a field, reinforcement learning has progressed tremendously in the past decade.

The main goal of this book is to present an up-to-date series of survey articles on the main contemporary sub-fields of reinforcement learning. This includes surveys on partially observable environments, hierarchical task decompositions, relational knowledge representation and predictive state representations. Furthermore, topics such as transfer, evolutionary methods and continuous spaces in reinforcement learning are surveyed. In addition, several chapters review reinforcement learning methods in robotics, in games, and in computational neuroscience. "

Berlin: [, Springer], 2012

e20398760

eBooks Universitas Indonesia Library

Recent advances in reinforcement learning : 9th European workshop, EWRL 2011, Athens, Greece, September 9-11, 2011 : revised selected papers

"This book constitutes revised and selected papers of the 9th European Workshop on Reinforcement Learning, EWRL 2011, which took place in Athens, Greece in September 2011. The papers presented were carefully reviewed and selected from 40 submissions. The papers are organized in topical sections online reinforcement learning, learning and exploring MDPs, function approximation methods for reinforcement learning, macro-actions in reinforcement learning, policy search and bounds, multi-task and transfer reinforcement learning, multi-agent reinforcement learning, apprenticeship and inverse reinforcement learning and real-world reinforcement learning."

Berlin: Springer-Verlag, 2012

e20409054

eBooks Universitas Indonesia Library

Dandung Sektian

Rancang bangun sistem pengendalian ketinggian air berbasis PLC dan reinforcement learning dengan agent twin delayed deep deterministic policy gradient (TD3) = Design of water level control system based on PLC and reinforcement learning with twin delayed deep deterministic policy gradient (TD3)

"Pengendalian ketinggian atau biasa disebut Level Controller adalah hal yang penting di berbagai bidang industri, termasuk industri kimia, industri minyak bumi, industri pupuk, industri otomatif dan lain-lainnya. Pada penelitian ini, dirancang sebuah pengendali non-konvesional menggunakan Reinforcement Learning dengan Twin Delayed Deep Deterministic Polic Gradient (TD3). Agent ini diterapkan pada sebuah miniature plant yang berisi air sebagai fluidanya. Miniature plant ini disusun dengan berbagai komponen yaitu flow transmitter, level transmitter, ball-valve, control valve, PLC, dan pompa air. Kontroler agent TD3 dirancang menggunakan SIMULINK Matlab di computer. Data laju aliran dan ketinggian air diambil melalui flow transmitter dan level transmitter yang dikoneksikan dengan OPC sebagai penghubung antara Matlab ke SIMULINK. Penerapan agent TD3 pada sistem pengendalian ketinggian air digunakan pada dua kondisi yaitu secara riil plant dan simulasi. Dari penelitian ini didapatkan, bahwa kontroler agent TD3 dapat mengendalikan sistem dengan baik. overshoot yang didapatkan kecil yaitu 0,57 secara simulasi dan 0,97 secara riil plant.

In this study, the level controller is the most important in many industry fields, such as chemical industry, petroleum industry, automotive industry, etc., a non-conventional controller using Reinforcement Learning with Twin Delayed Deep Deterministic Policy Gradient (TD3) agent was designed. This agent was implemented in water contain the miniature plant. This miniature plant consists of many components: flow transmitter, level transmitter, ball-valve, control valve, PLC, and water pump. Agent controller was designed using SIMULINK Matlab on a computer, which obtained flow rate and height information comes from flow transmitter and level transmitter connected to OPC that link between Matlab to SIMULINK. Implementation of TD3 to control water level system used two conditions, in real plant and simulation. In this study, we obtain that the TD3 agent controller can control the designs with a slight overshoot value, namely 0,57 in the simulation and 0,97 in the real plant."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2021

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Budhitama Subagdja

Dynamic and incremental exploration strategy in fusion adaptive resonance theory for online reinforcement learning./

"One of the fundamental challenges in reinforcement learning is to setup a proper balance between ex-ploration and exploitation to obtain the maximum cummulative reward in the long run. Most proto-cols for exploration bound the overall values to a convergent level of performance. If new knowledge is inserted or the environment is suddenly changed, the issue becomes more intricate as the explo-ration must compromise the pre-existing knowledge. This paper presents a type of multi-channel adaptive resonance theory (ART) neural network model called fusion ART which serves as a fuzzy approximator for reinforcement learning with inherent features that can regulate the exploration stra-tegy. This intrinsic regulation is driven by the condition of the knowledge learnt so far by the agent. The model offers a stable but incremental reinforcement learning that can involve prior rules as boot-strap knowledge for guiding the agent to select the right action. Experiments in obstacle avoidance and navigation tasks demonstrate that in the configuration of learning wherein the agent learns from scratch, the inherent exploration model in fusion ART model is comparable to the basic E-greedy po-licy. On the other hand, the model is demonstrated to deal with prior knowledge and strike a balance between exploration and exploitation.

Salah satu permasalahan mendasar dari Reinforcement Learning adalah mengatur keseimbangan anta-ra eksplorasi dan eksploitasi untuk mendapatkan ganjaran (reward) maksimal secara kumulatif dalam jangka waktu yang lama. Ketika pengetahuan awal diikutsertakan, masalah muncul karena eksplorasi yang dilakukan harus dikompromikan dengan pengetahuan sebelumnya yang telah dipelajari. Maka-lah ini menampilkan salah satu jenis jaringan saraf tiruan adaptive resonance theory (ART) berkanal ganda yang dikenal juga dengan sebutan fusion ART yang juga merupakan aproksimator Fuzzy untuk reinforcement learning dengan kemampuan meregulasi strategi eksplorasi sebagai sifat dasarnya. Mo-del ini menawarkan proses pembelajaran yang stabil tetapi inkremental serta mampu melibatkan pe-ngetahuan awal yang memilih aksi yang benar. Eksperimen menggunakan navigasi dan menghindari rintangan sebagai domain masalah menunjukkan bahwa konfigurasi pembelajaran menggunakan sifat dasar untuk meregulasi eksplorasi sebanding dengan metoda umum yang menggunakan aturan E-greedy. Di lain pihak, model yang diusulkan ini juga menunjukkan kemampuan dalam menggunakan pengetahuan awal serta mencapai keseimbangan dalam eksplorasi dan eksploitasi pengetahuan"

Nanyang Technological University, Joint NTU-UBC Research Centre of Excellence in Active Living for the Elderly, 2016

PDF

Artikel Jurnal Universitas Indonesia Library

Faathir Chikal Asyuraa

Permasalahan Multi-Armed Bandit dengan Piecewise-Stationary Bernoulli Arms = Multi-Armed Bandit Problem with Piecewise-Stationary Bernoulli Arms

"Permasalahan Multi-Armed Bandit adalah permasalahan dalam reinforcement learning yang berfokus pada rancangan eksperimen, diberikan sebuah himpunan opsi yang disebut arms yang dapat dipilih berkali-kali, bagaimana cara menyeimbangkan antara mengeksplorasi arm yang ada untuk mengumpulkan informasi atau mengeksploitasi arm yang terlihat terbaik untuk memaksimalkan keuntungan. Oleh karena itu, Multi-Armed Bandit menjadi alternatif yang lebih dinamis dari percobaan acak. Contoh dari aplikasi Multi-Armed Bandit adalah menentukan artwork film yang harus ditunjukkan untuk menarik pengunjung untuk menonton film tersebut. Distribusi Bernoulli dengan parameter θ dipilih untuk memodelkan respons dari pengunjung setelah melihat artwork film. Kondisi tidak stasioner pada θ dapat diimplementasikan untuk mengakomodasi periode keunggulan berbeda dalam artwork film. Kondisi tidak stasioner pada studi ini dimodelkan melalui piecewise-stationary, yaitu θ dapat berubah nilai, namun tetap konstan di setiap periode yang didefinisikan. Pada penelitian ini, digunakan beberapa policy seperti Epsilon Greedy, SoftMax, Upper Confidence Bounds, Thompson Sampling, Sliding Window Upper Confidence Bounds, Discounted Upper Confidence Bounds, dan juga Discounted Thompson Sampling, untuk menangani permasalahan Multi-Armed Bandit dengan Piecewise-Stationary Bernoulli Arms. Simulasi dilakukan pada kondisi yang berbeda-beda untuk menguji performa policy tersebut dalam berbagai kondisi yang ada. Berdasarkan simulasi tersebut, Discounted Thompson Sampling policy menunjukkan performa yang sangat baik dalam menangani kondisi stasioner maupun piecewise-stationary.

The Multi-Armed Bandit problem is a problem in reinforcement learning that focuses on how to design an experiment, given a set of options called arms that could be tried many times, how to balance between exploring the available arms to gather information or exploiting the seemingly best arm to maximize profit. Because of this, Multi-Armed Bandit has gained its popularity as a more dynamic approach to a randomized trial. An example of Multi-Armed Bandit is in determining recommending a film artwork to show to a visitor. Bernoulli distribution with parameter θ is chosen to model the respons of the visitor whether they watch the film or not. Non-stationary condition on θ can be implemented to accommodate various trends in film artworks, the non-stationary condition in this study is modeled through Piecewise-Stationary. In this study, several policies are used, such as Epsilon Greedy, SoftMax, Upper Confidence Bounds, Thompson Sampling, Sliding Window Upper Confidence Bounds, Discounted Upper Confidence Bounds, and Discounted Thompson Sampling, in handling Multi-Armed Bandit with Piecewise-Stationary Bernoulli Arms. Multiple simulations have been done to empirically evaluate the performance of the policies. Based on the simulation, Discounted Thompson Sampling policy shows a remarkable performance in tackling stationary and piecewise-stationary condition."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2020

S-Pdf

UI - Skripsi Membership Universitas Indonesia Library

Annisa Khoirul Mumtaza

Implementasi Reinforcement Learning dengan Menggunakan Algoritma Twin Delayed Deep Deterministic Policy Gradient (TD3) untuk Pengendalian Ketinggian Air pada Sistem Coupled Tank = Implementation of Reinforcement Learning Using Twin Delayed Deep Deterministic Policy Gradient (TD3) Algorithm for Water Level Control in Coupled Tank System

"Sistem coupled tank merupakan salah contoh penerapan sistem kontrol level industri yang memiliki karakteristik yang kompleks dengan non linieritas yang tinggi. Pemilihan metode pengendalian yang tepat perlu dilakukan untuk dapat diterapkan dalam sistem coupled tank agar dapat memberikan kinerja dengan presisi tinggi. Sejak awal kemunculannya, Reinforcement Learning (RL) telah menarik minat dan perhatian yang besar dari para peneliti dalam beberapa tahun terakhir. Akan tetapi teknologi ini masih belum banyak diterapkan secara praktis dalam kontrol proses industri. Pada penelitian ini, akan dibuat sebuah sistem pengendalian level pada sistem coupled tank dengan menggunakan Reinforcement Learning dengan menggunakan algoritma Twin Delayed Deep Deterministic Policy Gradient (TD3). Reinforcement Learning memiliki fungsi reward yang dirancang dengan sempurna yang diperlukan untuk proses training agent dan fungsi reward tersebut perlu diuji terlebih dahulu melalui trial and error. Performa hasil pengendalian ketinggian air pada sistem coupled tank dengan algoritma TD3 mampu menghasilkan pengendalian yang memiliki keunggulan pada rise time, settling time, dan peak time yang cepat serta nilai steady state eror sangat kecil dan mendekati 0%.

The coupled tank system is an example of the application of an industrial level control system that has complex characteristics with high non-linearity. It is necessary to select an appropriate control method to be applied in coupled tank systems in order to provide high-precision performance. Since its inception, Reinforcement Learning (RL) has attracted great interest and attention from researchers in recent years. However, this technology is still not widely applied practically in industrial process control. In this research, a level control system in a coupled tank system will be made using Reinforcement Learning using the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm. Reinforcement Learning has a perfectly designed reward function that is required for the agent training process and the reward function needs to be tested first through trial and error. The performance of the results of controlling the water level in the coupled tank system with the TD3 algorithm is able to produce controls that have advantages in rise time, settling time, and peak time which are fast and the steady state error value is very small and close to 0%."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2023

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Annisa Khoirul Mumtaza

The coupled tank system is an example of the application of an industrial level control system that has complex characteristics with high non-linearity. It is necessary to select an appropriate control method to be applied in the coupled tank system in order to provide high-precision performance. Since its inception, Reinforcement Learning (RL) has attracted great interest and attention from researchers in recent years. However, this technology is still not widely applied practically in industrial process control. In this research, a level control system in a coupled tank system will be created using Reinforcement Learning using the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm. Reinforcement Learning has a perfectly designed reward function that is required for the agent training process and the reward function needs to be tested first through trial and error. The performance of the results of controlling the water level in the coupled tank system with the TD3 algorithm is able to produce controls that have advantages in rise time, settling time, and peak time which are fast and the steady state error value is equal to 0%."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2023

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Fathan Akbar Rahmani

Simulasi Pengendalian Temperatur dan Kelembaban pada Sistem HVAC Menggunakan Reinforcement Learning dengan Algoritma Proximal Optimization (PPO) = Simulation of Temperature and Humidity Control in HVAC System Using Reinforcement Learning with Proximal Policy Optimization (PPO) Algorithm

"Penelitian dilakukan dalam bentuk simulasi sistem pengendalian temperatur dan kelembaban relatif pada central air-conditioner pabrik tekstil. Pengendalian menggunakan Reinforcement Learning (RL) dengan algoritma Proximal Policy Optimization (PPO). RL dirancang dan diambil datanya menggunakan software RL Designer ToolBox di MATLAB. Dilakukan training pada agent PPO untuk mengendalikan sistem dengan range pengendalian temperatur 18 o C – 25 o C dan kelembaban relatif 55% - 85%. Hasil training agent diukur dan dibandingkan performanya terhadap PI controller menggunakan parameter step response yang terdiri dari rise time, settling time, error steady state, dan overshoot. Berdasarkan pengujian hasil training didapatkan secara keseluruhan iterasi pengendalian temperatur dan kelembaban relatif RL memiliki rise time dan settling time dibawah 90 detik, memiliki percent overshoot dalam rentang 0% s.d 150%, dan steady state error kurang dari 1%.

The research was conducted in the form of a simulation of the temperatur and relatif humidity control system in the central air-conditioner of a textile factory. The control uses Reinforcement Learning (RL) with Proximal Policy Optimization (PPO) algorithm. RL was designed and data was collected using RL Designer ToolBox software in MATLAB. PPO agent was trained to control the system with a temperatur control range of 18oC – 25oC and relative humidity of 55% - 85%. The results of agent training are measured and compared to PI controller performance using step response parameters consisting of rise time, settling time, steady state error, and overshoot. Based on testing the training results obtained overall iteration of temperatur control and relatif humidity RL has a rise time and settling time under 90 seconds, has a percent overshoot in the range of 0% to 150%, and steady state error less than 1%."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2023

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Alexander

Menguji Dampak Variasi Hyperparameter Pada Penerapan Multi Agent Deep reinforcement Learning untuk Mengoptimasi Konsumsi Daya Sistem Penggerak Kereta = Examining The Impact of Hyperparameter Variance in the Im- plementation of Multi Agent Deep Reinforcement Learning for Optimizing the Energy Consumption of Train Driving System

"Penggunaan energi secara efisien merupakan hal yang penting untuk mengatasi peningkatan permintaan terhadap energi pada masa kini. Penelitian ini bertujuan untuk mengoptimasi penggunaan energi terutama pada kereta dengan menerapkan algoritma Deep Deterministic Policy Gradient secara Multi Agent. Algoritma ini telah terbukti pada literatur akan kemampuannya dalam menangani permasalahan dengan aksi yang besifat kontinuu. Akan tetapi DDPG terkenal sensitif terhadap variasi \textit{hyperparameter} dan sumber daya komputasi yang besar untuk menemukan strategi optimal. Penelitian ini bertujuan untuk mempelajari dampak dari variasi \textit{hyperparameter} dan memilih nilai yang tepat pada penerapan Multi-Agent DDPG untuk mengoptimasi sistem penggerak kereta.

Efficient usage of energy is necessary to cope with the increasing demand of modern society. This research aims to fulfill this goal by implements Deep Deterministic Policy Gradient (DDPG) as its DRL algorithm. DDPG has been proven in literature for its ability in controlling continuous action space. But DDPG is known to be brittle to hyperparameter and need a lot of time and computational resource to find optimal policy. This research aims to learn the effect of different value of hyperparameter in the implementation of Multi Agent DDPG to optimize the energy usage of train driving system."

Depok: Fakultas Teknik Universitas Indonesia, 2023

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Muhammad Ziyad Ain Nur Rafif

Rancang Bangun Sistem Coupled-Tank untuk Pengendalian Ketinggian Air Berbasis Programmable Logic Controller dan Reinforcement Learning dengan Algoritma Twin-Delayed Deep Deterministic Policy Gradient = Design of a Coupled-Tank System for Water Level Control based on Programmable Logic Controller and Reinforcement Learning with Twin-Delayed Deep Deterministic Policy Gradient Algorithm

"Sistem coupled-tank merupakan konfigurasi yang digunakan pada industri dalam hal pengendalian ketinggian air, biasanya dengan metode pengendalian proportional, integral, derivative (PID). Namun, metode lain seperti reinforcement learning (RL) juga bisa diterapkan. Metode RL dapat dikombinasikan dengan programmable logic controller (PLC) yang sering digunakan dalam proses industri. PLC mengontrol ketinggian air dengan membaca data dari water level transmitter dan mengatur bukaan control valve berdasarkan algoritma RL yang sudah dilatih untuk mencapai kontrol optimal. Algoritma RL yang digunakan adalah twin-delayed deep deterministic (TD3) policy gradient. Performa algoritma ini diukur menggunakan parameter seperti overshoot, rise time, settling time, dan steady-state error, lalu dibandingkan dengan pengendali PID konvensional. Hasil simulasi dan pengujian pada hardware menunjukkan bahwa algoritma RL menghasilkan overshoot sebesar 6.59% dan steady-state error sebesar 3.53%, di mana steady-state error ini terjadi karena sensor yang sensitif sehingga data ketinggian air tidak pernah terekam konstan dan stabil. Sebagai perbandingan, pengendali PID memiliki overshoot sekitar 23.38% dan steady-state error terkecil berkisar pada 7.15%, yang berarti pengendali RL sudah memiliki performa yang lebih baik dibandingkan pengendali PID.

Coupled-tank system is a configuration commonly used in industry, mainly for water level control with proportional, integral, and derivative (PID) control method. But, other methods like reinforcement learning (RL) can be implemented for this control problem. This RL method can be combined with programmable logic controller (PLC) which is often used in industry process. PLC will control water level by reading data from water level transmitter and controlling a control valve opening according to a trained RL algorithm to gain an optimal control. The RL algorithm used is twin-delayed deep deterministic (TD3) policy gradient. The algorithm’s performance will be measured by parameters such as overshoot, rise time, settling time, and steady-state error, and then compared with the conventional PID control method. According to the results from simulation and from the real hardware, the overshoot value that happens is only in the range of 6.59% with the smallest steady-state error value ranged around 3.53%, which happens due to the sensitive sensor so that water level data never recorded at a constant and stable state. For comparison, the PID control has an overshoot around 23.38% and smallest steady-state error around 7.15%, which means that the RL control method has a better performance than the PID control method."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2024

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

<< 1 2 3 4 >>

Hasil Pencarian :: Simpan CSV :: Kembali

Hasil Pencarian