Policy gradient methods for reinforcement learning can easily be applied to some of the undesirable classes of the value function approaches, such as POMDP environments. However, policy gradient methods do not always learn optimal policy when the task has more than one local optimum solution. In this paper, we propose a new algorithm based on policy gradient methods which learn optimal policy more robustly than traditional policy gradient methods. Our method which is a multi point search method has policy parameter sets. Improving from various policy parameters can increase the probability of learning optimal policy. And, importance sampling techniques enable the policy gradient methods to estimate the gradient for all parameter sets from single pass experiences. Further, we introduce an additional policy which selects best parameter from policy parameter sets. The additional policy is improved to maximize the agent performance by policy gradient methods. We develop the algorithm using these techniques, and demonstrate through simulations of a binary tree task and a locomotion task on a crawling robot.