首页    期刊浏览 2025年04月22日 星期二
登录注册

文章基本信息

  • 标题:多峰性景観下での直接政策探索:Population-based Policy Gradient法の提案
  • 本地全文:下载
  • 作者:宮前 惇 ; 永田 裕一 ; 小野 功
  • 期刊名称:進化計算学会論文誌
  • 电子版ISSN:2185-7385
  • 出版年度:2011
  • 卷号:2
  • 期号:1
  • 页码:1-11
  • DOI:10.11394/tjpnsec.2.1
  • 出版社:The Japanese Society for Evolutionary Computation
  • 摘要:

    Policy gradient methods for reinforcement learning can easily be applied to some of the undesirable classes of the value function approaches, such as POMDP environments. However, policy gradient methods do not always learn optimal policy when the task has more than one local optimum solution. In this paper, we propose a new algorithm based on policy gradient methods which learn optimal policy more robustly than traditional policy gradient methods. Our method which is a multi point search method has policy parameter sets. Improving from various policy parameters can increase the probability of learning optimal policy. And, importance sampling techniques enable the policy gradient methods to estimate the gradient for all parameter sets from single pass experiences. Further, we introduce an additional policy which selects best parameter from policy parameter sets. The additional policy is improved to maximize the agent performance by policy gradient methods. We develop the algorithm using these techniques, and demonstrate through simulations of a binary tree task and a locomotion task on a crawling robot.

国家哲学社会科学文献中心版权所有