Monte-Carlo Tree Search

Exploiting Continuity of Rewards

Finding exact solutions of POMDPs is generally computationally intractable, but the solution can be approximated by sampling-based approaches. These approaches rely on multi-armed bandit (MAB) heuristics, which assume the outcomes of different actions to be uncorrelated. In some applications, like motion planning in continuous spaces, similar actions yield similar outcomes. We use variants of MAB that make Lipschitz continuity assumptions on the outcomes of actions to improve the efficiency of sampling-based planning approaches.

Multi-hypothesis planning. Q-value profile of a scenario without considering uncertainty. Multi-hypothesis planning. Q-value profile of a scenario with uncertainty.
Multi-hypothesis planning. Gaussian process regression applied to a Q-value profile by sampling 100 particles.
Multi-hypothesis planning. Convergence analysis done by inspecting mean absolute error of a scenario for different number of actions.

Information Particle Filter Tree

Planning in POMDPs inherently gathers the information necessary to act optimally under uncertainties. The framework can be extended to model pure information gathering tasks by considering belief-based rewards. Multi-hypothesis planning. This allows utilizing reward shaping to guide POMDP planning to informative beliefs by using a weighted combination of the original reward and the expected information gain as the objective. We propose an online algorithm, IPFT, to solve problems with belief-dependent rewards on continuous domains.
Multi-hypothesis planning.

Continuous light-dark problem, which is one of the benchmark problems utilized in our evaluation.

The evaluation shows that the consideration of information gain greatly improves the performance in problems where information gathering is an essential part of the optimal policy.

Multi-hypothesis planning.

Publications on this topic

Name Material
Johannes Fischer, Ömer Sahin Tas. Information Particle Filter Tree: An Online Algorithm for POMDPs with Belief-Based Rewards on Continuous Domains. In Proceedings of the 37th International Conference on Machine Learning (ICML), Vienna, Austria, July 2020.