RL Competition 2009

News

Slides from the workshop presentations are now available.

Results are now available. Congratulations to our winners.

Testing round closed. Thank you to all of our competitors!

Updated Testing application (R15) is now available HERE.

Proving application is now available HERE.

The rules, schedule, and prizes have been announced.

GAME ON! The software is now available.

Stay Informed

Sign up for our mailing list to receive important announcements about the competition.

Domains

For questions, suggestions, or bug reports, please join the discussion forums.

Acrobot

Acrobot Agents must balance a two-jointed virtual gymnast. Challenges include:

  • Nonlinear dynamics: while controls are discrete, the state space is still continuous.
  • Explore / exploit: This is a parameterized variant on the standard RL benchmark. Quickly identifying the model space is critical to good performance.
  • Precision control: because this domain is simpler than many others, competition will be fierce. Adaptive discretization or precise function approximation may make only a small difference, but it might be enough to win the event!

Get more details on the acrobot domain here.

Adversarial Tetris

Tetris Agents play tetris with several twists. New pieces arrive according to probability distributions which are at the mercy of an adversary, who chooses the piece which he thinks would be the worst fit for the current board. Challenges include:

  • Basic piece placement: Which location should be selected to maximize both short- and long-term reward?
  • Opponent modeling: The best agents can reason about which pieces are likely to improve their ability to plan optimal piece placements. Is it possible to fool the adversary?

Get more details on the Tetris domain here.

Helicopter

Helicopter Based on the helicopter simulator from Andrew Ng's group, agents must control a helicopter which is attempting to stably hover. Challenges include:

  • Dynamics: Wind effects and complicated nonlinear dynamics make this a challenging problem.
  • Explore / exploit: this domain includes the catastrophic event of crashing. Agents must explore carefully to avoid unrecoverable errors.

Get more details on the helicopter domain here.

Infinite Mario

Mario Agents play a variant of Super Mario, a complete side-scrolling video game with destructible blocks, enemies, fireballs, coins, chasms, platforms, etc. The state space is complicated, but factored in an object-oriented way, which captures many aspects of the real world. Challenges include:

  • Path planning: How can Mario navigate around simple obstacles, or through complicated sets of blocks?
  • Option learning and execution: Are there reusable sensory-motor primitives which simplify planning? Can these be learned?
  • Explore / exploit: Do enemies always behave the same way? Are there stochastic effects of blocks that can be learned?

Get more details on the mario domain here.

Octopus Arm

Octopus Agents must control a long, flexible octopus arm and make it grab food, without leaving its tank or bumping into itself. Challenges include:

  • Dynamics: This problem challenges competitors with a continuous, high-dimensional state space (82 dimensions), high-dimensional action space (32 dimensions) and nonlinear dynamics.
  • Learning options / hierarchical actions: with smooth, nonlinear dynamics, a reasonable approach to control is to learn sensory-motor primitives. How?
  • Learning action effects: The actions have complicated effects on many aspects of the state space, but actions are closely related to each other. With such a high-dimensional action space, modelers must share statistical strength between samples of different actions.

Get more details on the octopus arm domain here.

Polyathlon

Polyathlon Competitors must code a general purpose RL agent. Agents are tested on a variety of different MDPs which do not exhibit systematic structure between themselves. This forces the agent to learn quickly and reason flexibly about general MDPs. Challenges include:

  • Explore / exploit: in a general MDP, the explore/exploit dilemma is key. Although some theoretical analyses exist for different algorithms that navigate this tradeoff, which will perform best in practice?
  • Structure learning: is there structure in the space of rewards or state transitions?
  • Aggregation: can states be aggregated, either to learn an improved model or accelerate planning?

Get more details on the polyathlon here