Assume you have an environment with two states, 1 and 2, and…
Assume you have an environment with two states, 1 and 2, and two possible actions in each state: LEFT and RIGHT. You have implemented an active Q-learning agent, which currently has learned the following Q-function values:Q(1,left) = 0.2Q(1,right) = 0.5Q(2,left) = 0.6Q(2,right) = 0.4You are currently in state 2 after taking the action RIGHT from state 1 and receiving a reward of 0.4. What will be value of Q(1,right) after it is updated?Assume the learning rate is now 0.5 and the discount factor is 0.9.
Read Details