Safe and Interpretable Exploration
Active Information Gathering
POMCP proves to be inefficient for continuous action and observation spaces. It further cannot consider belief-dependent information rewards, which guide samples to more rewarding actions. For this reason, we employ a Information Particle Filter Tree algorithm, a Monte Carlo tree search (MCTS) based approach, to guide the agent to collect more information and make more informed decisions.

POMDP search tree with and without active information gathering. The Q-values of the search tree with active information gathering include belief-based information rewards. Therefore, its Q-value profile is substantially different.
Passive Information Gathering
Motion planning involves decision making among combinatorial maneuver variants in urban driving. A planner must consider uncertainties and associated risks of the maneuver variants, and subsequently select a maneuver alternative. However, sometimes the uncertainty is so high that a reasonable decision is not possible. In such cases, the planner should postpones the combinatorial decision making to a later time and drive maneuver-neutral trajectory with the expectation that more information will be available in the future. In this way, safe but not defensive motion is obtained.

A use highlighting the benefit of planning neutral trajectories. The environment information is often quite noisy and has a tendency to contain false positive object detection. State-of-the-art motion planners consider all objects alike, thus producing overcautious behavior. My planning approach that considers alternative maneuvers in a combined fashion and plans a motion that is formed by the probabilities of those alternatives. The proposed planner can smoothly react to objects with low existence probability while remaining collision-free in case their existence substantiates.
Information Particle Filter Tree

The evaluation in various settings shows that the consideration of information gain greatly improves the performance in problems where information gathering is an essential part of the optimal policy. Belief states for the continuous light-dark problem is shown below, whereas the corresponding action sequence is shown on the right.

