THESIS

An Analysis of Deep Q-Networks and Applications of Generative Adversarial Networks in Reinforcement Learning

B. Tech Thesis Spring 2020

Abstract

Reinforcement Learning (RL) is a derivative of Machine Learning that is highly influenced by ideologies of Human Learning. Humans learn from their surroundings through multidimensional sensory inputs including visual, auditory, textural, olfactory stimuli among others, and corresponding rewards to learn the optimal behavior in a given situation . Optimal behavior in RL essentially implies maximized rewards. Reinforcement Learning incorporates these basic ideas to enable an agent to learn the optimal policy to follow in an environment. In this project we analyze and present the Deep Q-Learning algorithm along with its most important variants.

Generative Adversarial Networks (GANs) have been termed as the most interesting idea in the last 10 years in Machine Learning. In this project we present an attempt to generate the transition probabilities of a simple Markov Decision Process with only data from the environment as received by the agent, using GANs in section 13. With this find, we hope to extend this process to the more complex MDPs in the space of Reinforcement Learning, and ultimately be able to generate valid Experience Replay samples via the GAN, which will significantly reduce memory requirement of the learning process in Deep RL tasks.

Trained Models

Before training
During training
After training