Introduction

Reinforcement learning is a concept that has revolutionized the world of artificial intelligence. You might have seen AlphaGo beat one of the world’s best Go players, Lee Sedol, in a 4-1 match or move 37 by AlphaGo which baffled professional Go players such as Fan Hui for its creativity and beauty and shocked its opponent, Lee Sedol.

Go Board

Move 37. While at first experts thought it was a mistake, Fan Hui saw the beauty in this unusual move.

That … is incredible.

AlphaGo was the result of a reinforcement learning algorithm. But what is that?

Definition

Reinforcement learning happens when an agent (in this case AlphaGo) is rewarded for a good result and penalized for a bad one. i.e. if it wins, it is rewarded. This is similar to housetraining a puppy. You might give him treats for taking his business outside but shout or clap loudly when he’s soaking up the carpet.

The process goes as follows

  1. The agent performs an action on the environment (puppy takes business outside)
  2. An interpreter looks at the result of the action on the environment and decides to penalize or reward the agent (human gives the puppy treats)
  3. The agent is rewarded or penalized (puppy eats treats)

As a result of the AI being rewarded for good actions, it would learn what good moves are and later form strategies to beat the game, gaining the reward. As such, AI like AlphaGo would learn from playing many, many games and become absolutely brilliant at it.

Problems to Overcome

There are two main issues that reinforcement learning algorithms need to overcome.

One. The machine learning algorithm has to decide between breadth and depth. Should it explore different kinds of actions so that it better understands the environment or focus on one particular kind of action to generate a better understanding of this kind of action? (such as someone experimenting with different sports or focusing on getting better at soccer)

Two. Even with a proper understanding of the environment, what strategy should the algorithm take?

One Particular Difficulty

Say you are making a reinforcement learning algorithm for chess and it makes a move to sacrifice the horse to take out the bishop. How do you know if it is a good move or not in order to reward or penalize the agent? At that state it is impossible to tell as twenty moves later it could work out brilliantly or fail miserably.

Chess Pieces

One way to get around this problem is to define every move by the result. It loses the game? Oh well, every move it performed in that game is now a bad move. It won?! Great! Every move, no matter how bad some of them are, is a great move.

It may seem like a terrible way to do it. Even though it might be penalized even for a good move if it lost, over the long run over thousands of trials, if it made a good move, it is more likely to win. So on average making a good move will be rewarded.

This also works in practice.

Fun Examples

Here are some really amazing and interesting examples of the algorithm at work.

This game rewards the player for hitting checkpoints and collecting powerups. This reinforcement learning failed because it turns out that the interpreter gave too much reward for collecting powerups. As such, it only collected powerups which is much easier than solving the race.

An algorithm running in real life! This reinforcement learning algorithm balances a pencil, proving that reinforcement learning can be applied to physical objects.

Here’s a great one by Google which shows a “person” that is able to walk. This AI is able to overcome the problem of overfitting (which happens when AI can only be applied to a specific situation and cannot adapt to new situations / data) as it is able to react to new forces and new obstacles. You might have an AI that is only built for a specific obstacle and gets really good at it but fails if the obstacle is changed.

More Fun Examples