Exploiting Continuity of Rewards

Finding exact solutions of POMDPs is generally computationally intractable, but the solution can be approximated by sampling-based approaches. These approaches rely on multi-armed bandit (MAB) heuristics, which assume the outcomes of different actions to be uncorrelated. In some applications, like motion planning in continuous spaces, similar actions yield similar outcomes. We use variants of MAB that make Lipschitz continuity assumptions on the outcomes of actions to improve the efficiency of sampling-based planning approaches.

Multi-hypothesis planning. Q-value profile of a scenario without considering uncertainty. Multi-hypothesis planning. Q-value profile of a scenario with uncertainty.
Multi-hypothesis planning. Gaussian process regression applied to a Q-value profile by sampling 100 particles.
Multi-hypothesis planning. Convergence analysis done by inspecting mean absolute error of a scenario for different number of actions.

Curriculum Learning for Deep Reinforcement Learning

Many tasks in robotics require coordination among multiple agents. Deep reinforcement learning arises as a powerful tool for making decisions that involve complex interactions. I focus on sampling mechanisms and distillation frameworks for curriculum learning to define tasks with increasing difficulty, while avoiding catastrophic forgetting.

Publications on this topic

Name Material
Ömer Şahin Taş, Felix Hauser, Martin Lauer. Efficient Sampling in POMDPs with Lipschitz Bandits for Motion Planning in Continuous Spaces. In IEEE Intelligent Vehicles Symposium (IV), 2021.