## BlackJack with Monte Carlo Prediction

In Dynamic Programming (DP), we solve the Markov Decision Process (MDP) by using value iteration and policy iteration. Both of

In Dynamic Programming (DP), we solve the Markov Decision Process (MDP) by using value iteration and policy iteration. Both of

Learning from human preference is a major breakthrough in Reinforcement learning (RL). The algorithm was proposed by researchers at OpenAI

Are you a fan of the game chess? If I asked you to play chess, how would you play the

In TRPO, we improve the policy and impose a constraint that the KL divergence between an old policy and a new policy is to be less than some constant.

In this post, we will look into the very popular off-policy TD control algorithm called Q learning. Q learning is a very simple and widely used TD algorithm.

We will learn a new technique called hindsight experience replay (HER) proposed by OpenAI researchers for dealing with sparse rewards.

Reinforcement learning (RL) is a branch of machine learning where the learning occurs via interacting with an environment. It is