Reinforcement Learning

Q-Learning

Is an algorithm where the agent attempts to learn what the optimal policy is from its history of interacting with the environment.

Steps

We first initialize the Q-Table, which is the backbone of the Q-Learning Algorithm.
This is what stores all the Q-Values for any given state/action pair the agent will encounter in the environment.
In order to start populating these values with meaningful numbers the agent needs to randomly select an action at any given state and collect the associated reward.
If the action was bad then the Q-Value that state/action pair will decrease. On the other hand, if the action was good then the opposite happens and the Q-Value increases.
Eventually at some point the agent needs to stop exploring and start exploiting the values and information in the Q-Table.
This is where the policies such as epsilon-greedy come into play.