A simple linear regression based approach for teaching a computer to play Tic-Tac-Toe. The system learns to play better by repeatedly playing against itself and learning to recognize good and bad moves. This is a solution to one of the problems in Chapter 1 of Machine Learning.

Approach

I just finished the first chapter of Machine Learning (Mcgraw-Hill International Edit) by Tom M. Mitchell. This chapter discusses the very basics of how to write a learning system and serves as an introduction to the rest of the book. As part of reading the chapter I decided to do exercise 1.5:

Implement an algorithm similar to that discussed for the checkers problem, but use the simpler game of tic-tac-toe. Represent the learned function Vestimate as a linear combination of board features of your choice. To train your program play it repeatedly against a second copy of the program that uses a fixed evaluation function you create by hand. Plot the percent of games won by your system, versus the number of training games played.

I’ve taken numerous courses on artificial intelligence but I’ve only solved problems such as sudoku solvers and classification learners. While this is a simpler problem, it is a system that when done will be able to provide a challenge to my own wit. I started out very curious as to how well a simple linear function learning system will be able to learn how to play the game.

To complete this problem I created a function that evaluate board states using a linear function which takes a hypothesis (7 weights) and features (6 of them) that are extracted from the board state. When playing the game the learner (the computer) gets all of the legal moves and applies the evaluation function to the new states to learn which move gets the highest rating from the evaluator, which it then acts on.

The six features that I extracted from every board state were (a row is 3 subsequent squares… the rows, columns, and diagonals):

x1 = # of instances where there are 2 x’s in a row with an open subsequent square.
x2 = # of instances where there are 2 o’s in a row with an open subsequent square.
x3 = # of instances where there is an x in a completely open row.
x4 = # of instances where there is an o in a completely open row.
x5 = # of instances of 3 x’s in a row (value of 1 signifies end game)
x6 = # of instances of 3 o’s in a row (value of 1 signifies end game)

I would give a the learner some random weights/hypothesis to start (I set w0,w1,…,w6 all equal to .5). Then I played it against another learner (with the same starting weights). After the game is over I generate training data from the game to use to refine the weights for the next game.

Vtrain(boardstate) = 100 if end of game and you won.
Vtrain(boardstate) = -100 if end of game and you lost.
Vtrain(boardstate) = 0 if end of game and a draw.
Vtrain(boardstate) = Vestimate(successor(boardstate)) in not the end of the game

Using this generated training data I update the weights using the least mean squares (LMS) method.

for each pair <boardstate, Vtrain(boardstate)>:

    use current weights to calculate Vestimate(boardstate).

    for each weight wi, update it as
       rwi = wi + learningConstant*(Vtrain(boardstate) - Vestimate(boardstate))*xi

(where learningConstant is a small constant like .1 or something that controls the rate at which the weights are updated) Once the weights are update the system is ready to player another game even smarter then the last!

The end results of this project were alright. I trained the system against a player that randomly chooses moves each turn 10,000 games before I played it. The computer was capable of leading most games into a draw, and only lost if I was really tricky! That being said I think my project was a success. On the other hand I was capable of beating it, and it has been shown that if you play perfect games you never lose (see the TinkerToy computer that has never lost a game here).

With chapter 1 complete I am ready (and excited!) to start on chapter 2.

I’ll keep you updated.

p.s. here is my code if anyone is interested (it is probably really buggy as I whipped it up pretty quick). It is written in python: TicTacToe.py.