出版社:The Japanese Society for Artificial Intelligence
摘要:Reinforcement Learning (RL) methods are very hopeful because they can learn useful behavior based on rewards from environment by trial and error. This paper tackles more difficult problems than the ones tackled by many ordinary RL methods: RL in POMDP (Partially Observable Markov Decision Process) environments with multiple rewards.