Reinforcement Learning
Q-Learning
Is an algorithm where the agent attempts to learn what the optimal policy is from its history of interacting with the environment.
Steps
We first initialize the
Q-Table, which is the backbone of the Q-Learning Algorithm.This is what stores all the Q-Values for any given state/action pair the agent will encounter in the environment.
In order to start populating these values with meaningful numbers the agent needs to randomly select an action at any given state and collect the associated reward.
If the action was bad then the Q-Value that state/action pair will decrease. On the other hand, if the action was good then the opposite happens and the
Q-Valueincreases.Eventually at some point the agent needs to stop exploring and start exploiting the values and information in the Q-Table.
This is where the policies such as
epsilon-greedycome into play.