RL Competition 2009

News

Slides from the workshop presentations are now available.

Results are now available. Congratulations to our winners.

Testing round closed. Thank you to all of our competitors!

Updated Testing application (R15) is now available HERE.

Proving application is now available HERE.

The rules, schedule, and prizes have been announced.

GAME ON! The software is now available.

Stay Informed

Sign up for our mailing list to receive important announcements about the competition.

Tetris

Tetris is a falling-blocks puzzle video game originally designed and programmed by Alexey Pajitnov in 1985. In Tetris, a pseudorandom sequence of tetrominoes (sometimes called "tetrads" in older versions) - shapes composed of four square blocks each - fall down the playing field. The object of the game is to manipulate these tetrominoes by moving each one sideways and rotating it by 90 degree units, with the aim of creating a horizontal line of blocks without gaps. When such a line is created, it disappears, and the blocks above (if any) fall. The game ends when the player "tops out", that is, when the stack of tetrominoes reaches the top of the playing field and no new tetrominoes are able to enter. Despite its simple rules, playing tetris well requires a complex strategy and lots of experience. Furthermore, Demaine et al. have shown that Tetris is hard in a mathematical sense as well [Demaine et al., 2003]: finding the optimal strategy is NP-hard even if the sequence of tetrominoes is known in advance. These properties make Tetris an appealing benchmark problem for testing reinforcement learning (and other machine learning) algorithms.

Tetris was first formalized as a Markov decision process by Tsitsiklis and Van Roy (1996). They achieved surprising perfomance using a simple linear architecture and a small set of expert game features. There have been several studies on applying reinforcement learning methods to Tetris, but none have significantly outperformed Tsitsiklis and Van Roy's results (see [Ramon and Driessens, 2004], [Bertsekas and Tsitsiklis, 1996], [Lagoudakis et al., 2002], [Farias and van Roy, 2006] and [Kakade, 2001]). Recent work by Szita & Lorincz (2006) presented result from a cross entropy method that are several orders of magnitude better than Tsitsiklis and Van Roy. The winners of the 2008 RL Competition, Bruno Scherrer, Christophe Thiery, and Amine Boumaza topped that [Thiéry, 2007]. Is it possible to create a reinforcement learning agent for the 2009 Competition that is even better?

The competition domain is based on the Van Roy (1995) specification of the Tetris problem. There are some differences, however:

  • In the competition problem the agent chooses to rotate or move the peice one space, as it falls (like the video game). In Van Roy's specification the agent chooses a final position and orientation for each piece when it first appears.
  • Clearing four lines at once is worth more than clearing single lines four times (like in the video game).
  • The distribution of piece types is not uniform. For example, in some games, there may be an abundance of "Z" pieces, in others, there will be a shortage. But you may expect that the probability of getting one will be lower when you really need one...
  • The competition version of tetris generalized. See the Rules Page for more information about the generalized evaluation paradigm. The variable parameters include the board width and height, the distribution of the tetromino types, and the type and extent of adversariality of the system.

Getting started

It is easy! in the /public/agents/tetrisAgentJava directory, you will find a simple Java agent that plays Tetris. It does no reinforcement learning (or any kind of learning), and it does not play well at all. However, it does contain a few useful routines for processing the board, testing tetromino placements, etc., along with some comments. All you have to do is to plug your favorite RL algorithm in. The readme.txt file in the agent's directory contains more detailed info about the usage of te sample agent.

Technical Details

Observation Space: high dimensional, discrete valued containing:

  1. bit map: binary representation of the current board (row major order)
  2. bit vector: indentifying the falling piece
  3. two integers: board size, number of rows and columns

Action Space: 1 dimensional, discrete valued

  1. manipulation of falling piece, eg: rotate and move left, right, down and drop

Rewards: positive function of rows eliminated on each step. Scoring possibilites for different Tetris parametrizations may be wildly different. To give an equal weight to each MDP in tcalculating the final score, rewards are normalized: a benchmark agent reaches about 100,000 points on each MDP. You do not have to care about normalization: the real-valued reward you get from the environment is exactly what is added up to get the final score of the agent.

Note: the competition software will provide your agent with a task specification string that describes the basic inputs and outputs of the particular problem instance your agent is facing. For the competition, the ranges provided in task specification may not be tight; they provide a rough approximation of the actual observation and action ranges. More documentation of the the task specification string can be found here .

References

  • [Bertsekas and Tsitsiklis, 1996] Bertsekas, D. P. and Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific.
  • [Demaine et al., 2003] Demaine, E. D., Hohenberger, S., and Liben-Nowell, D. (2003). Tetris is hard, even to approximate. In Proc. 9th International Computing and Combinatorics Conference (COCOON 2003), pages 351–363.
  • [Farias and van Roy, 2006] Farias, V. F. and van Roy, B. (2006). Probabilistic and Randomized Methods for Design Under Uncertainty, chapter Tetris: A Study of Randomized Constraint Sampling. Springer-Verlag UK.
  • [Kakade, 2001] Kakade, S. (2001). A natural policy gradient. In Advances in Neural Information Processing Systems (NIPS 14), pages 1531–1538.
  • [Lagoudakis et al., 2002] Lagoudakis, M. G., Parr, R., and Littman, M. L. (2002). Least-squares methods in reinforcement learning for control. In SETN '02: Proceedings of the Second Hellenic Conference on AI, pages 249–260, London, UK. Springer-Verlag.
  • [Ramon and Driessens, 2004] Ramon, J. and Driessens, K. (2004). On the numeric stability of gaussian processes regression for relational reinforcement learning. In ICML-2004 Workshop on Relational Reinforcement Learning, pages 10–14.
  • [Szita and Lorincz, 2006] Istvan Szita, András Lörincz: Learning Tetris Using the Noisy Cross-Entropy Method. Neural Computation 18(12): 2936-2941 (2006).
  • [Thiéry, 2007] Christophe Thiéry: Contrôle optimal stochastique et le jeu de Tetris. Master's Thesis, Université Henri Poincaré – Nancy I (2007).