9/3 - 6/4: MDPs, values, policies, dynamic programming, Monte Carlo, TD learning --- Survey paper, B.1-8.
Assignment: Implement policy iteration using various methods on a toy MDP.
27/4: Reward shaping (Ng, Harada, and Russell, 1999), and inverse RL (Ng and Russell, 2000), (Ramachandran and Amir, 2007).
4/5: POMDPs (Singh, Jaakkola, and Jordan 1994), (Cassandra, Kaelbling, and Littman, 1994), (Spaan and Vlassis, 2005).
11/5: Monte Carlo planning/RL (Kearns, Mansour, and Ng, 1999), (Kocsis and Szepesvari, 2006).
18/5: Actor-critic algorithms (Konda and Tsitsiklis, 2000), (Sutton, McAllester, Singh, and Mansour, 2000), (Peters and Schaal, 2006).
25/5: Least-squares methods in RL (Boyan, 1999), (Lagoudakis and Parr, 2002), (Yao and Liu, 2008).
1/6: Bayesian RL (Dearden, Friedman, and Andre, 1999), (Wang, Lizotte, Bowling, and Schuurmans, 2005), (Poupart, Vlassis, Hoey, and Regan, 2006).
8/6: Multiagent RL (Schneider, Wong, Moore, and Riedmiller, 1999), (Guestrin, Lagoudakis, and Parr, 2002), (Kok and Vlassis, 2004).