Finding exact solutions of POMDPs is generally computationally intractable, but the solution can be approximated by sampling-based approaches. These approaches rely on multi-armed bandit (MAB) heuristics, which assume the outcomes of different actions to be uncorrelated. In some applications, like motion planning in continuous spaces, similar actions yield similar outcomes. We use variants of MAB that make Lipschitz continuity assumptions on the outcomes of actions to improve the efficiency of sampling-based planning approaches.
Many tasks in robotics require coordination among multiple agents. Deep reinforcement learning arises as a powerful tool for making decisions that involve complex interactions. I focus on sampling mechanisms and distillation frameworks for curriculum learning to define tasks with increasing difficulty, while avoiding catastrophic forgetting.