文章基本信息

标题：多峰性景観下での直接政策探索：Population-based Policy Gradient法の提案
本地全文：下载
作者：宮前惇 ; 永田裕一 ; 小野功等
期刊名称：進化計算学会論文誌
电子版ISSN：2185-7385
出版年度：2011
卷号：2
期号：1
页码：1-11
DOI：10.11394/tjpnsec.2.1
出版社：The Japanese Society for Evolutionary Computation
摘要：
Policy gradient methods for reinforcement learning can easily be applied to some of the undesirable classes of the value function approaches, such as POMDP environments. However, policy gradient methods do not always learn optimal policy when the task has more than one local optimum solution. In this paper, we propose a new algorithm based on policy gradient methods which learn optimal policy more robustly than traditional policy gradient methods. Our method which is a multi point search method has policy parameter sets. Improving from various policy parameters can increase the probability of learning optimal policy. And, importance sampling techniques enable the policy gradient methods to estimate the gradient for all parameter sets from single pass experiences. Further, we introduce an additional policy which selects best parameter from policy parameter sets. The additional policy is improved to maximize the agent performance by policy gradient methods. We develop the algorithm using these techniques, and demonstrate through simulations of a binary tree task and a locomotion task on a crawling robot.