Algorithms for Robotic Problems

Graduate course, Dept. of Production Engineering and Management, Technical University of Crete, Spring 2009.
Instructor: Nikos Vlassis
Dates and location: Monday 14:30 - 17:00, room D3-20(E).

Summary

We will study algorithms for autonomous robot sequential decision making in tasks involving motion and sensing uncertainty (high-level stochastic optimal control). Covered topics will include Markov decision processes, reinforcement learning (model-free control), partially observable Markov decision processes, and game-theoretic models of decision making (multi-robot coordination). We will address the issues of planning vs learning, representation issues, exploration, and the computational complexity of planning/learning. Each student is expected to present a published paper and implement and demonstrate an existing algorithm on a simple problem.

Literature

Lectures

9/3 - 6/4: MDPs, values, policies, dynamic programming, Monte Carlo, TD learning --- Survey paper, B.1-8.
Assignment: Implement policy iteration using various methods on a toy MDP.

27/4: Reward shaping (Ng, Harada, and Russell, 1999), and inverse RL (Ng and Russell, 2000), (Ramachandran and Amir, 2007).

4/5: POMDPs (Singh, Jaakkola, and Jordan 1994), (Cassandra, Kaelbling, and Littman, 1994), (Spaan and Vlassis, 2005).

11/5: Monte Carlo planning/RL (Kearns, Mansour, and Ng, 1999), (Kocsis and Szepesvari, 2006).

18/5: Actor-critic algorithms (Konda and Tsitsiklis, 2000), (Sutton, McAllester, Singh, and Mansour, 2000), (Peters and Schaal, 2006).

25/5: Least-squares methods in RL (Boyan, 1999), (Lagoudakis and Parr, 2002), (Yao and Liu, 2008).

1/6: Bayesian RL (Dearden, Friedman, and Andre, 1999), (Wang, Lizotte, Bowling, and Schuurmans, 2005), (Poupart, Vlassis, Hoey, and Regan, 2006).

8/6: Multiagent RL (Schneider, Wong, Moore, and Riedmiller, 1999), (Guestrin, Lagoudakis, and Parr, 2002), (Kok and Vlassis, 2004).