Safe and Interpretable Exploration

Active Information Gathering

POMCP proves to be inefficient for continuous action and observation spaces. It further cannot consider belief-dependent information rewards, which guide samples to more rewarding actions. For this reason, we employ a Information Particle Filter Tree algorithm, a Monte Carlo tree search (MCTS) based approach, to guide the agent to collect more information and make more informed decisions.

Multi-hypothesis planning.

POMDP search tree with and without active information gathering. The Q-values of the search tree with active information gathering include belief-based information rewards. Therefore, its Q-value profile is substantially different.

Passive Information Gathering

Motion planning involves decision making among combinatorial maneuver variants in urban driving. A planner must consider uncertainties and associated risks of the maneuver variants, and subsequently select a maneuver alternative. However, sometimes the uncertainty is so high that a reasonable decision is not possible. In such cases, the planner should postpones the combinatorial decision making to a later time and drive maneuver-neutral trajectory with the expectation that more information will be available in the future. In this way, safe but not defensive motion is obtained.

Multi-hypothesis planning.

A use highlighting the benefit of planning neutral trajectories. The environment information is often quite noisy and has a tendency to contain false positive object detection. State-of-the-art motion planners consider all objects alike, thus producing overcautious behavior. My planning approach that considers alternative maneuvers in a combined fashion and plans a motion that is formed by the probabilities of those alternatives. The proposed planner can smoothly react to objects with low existence probability while remaining collision-free in case their existence substantiates.


Information Particle Filter Tree

Planning in POMDPs inherently gathers the information necessary to act optimally under uncertainties. The framework can be extended to model pure information gathering tasks by considering belief-based rewards. Multi-hypothesis planning. This allows utilizing reward shaping to guide POMDP planning to informative beliefs by using a weighted combination of the original reward and the expected information gain as the objective. We propose an online algorithm, IPFT, to solve problems with belief-dependent rewards on continuous domains.

The evaluation in various settings shows that the consideration of information gain greatly improves the performance in problems where information gathering is an essential part of the optimal policy. Belief states for the continuous light-dark problem is shown below, whereas the corresponding action sequence is shown on the right.

MCTS continuous light-dark problem.
MCTS continuous light-dark problem actions.

Publications on this topic

Name Material
Ömer Şahin Taş. Motion Planning for Autonomous Vehicles in Partially Observable Environments. Ph.D. thesis, Karlsruhe Institute of Technology, 2022.
Johannes Fischer, Ömer Şahin Taş. Information Particle Filter Tree: An Online Algorithm for POMDPs with Belief-Based Rewards on Continuous Domains. In International Conference on Machine Learning (ICML), 2020.
Ömer Şahin Taş, Christoph Stiller. Tackling Existence Probabilities of Objects with Motion Planning for Automated Urban Driving. In Workshops of the Robotics: Science and Systems (RSS), 2020.