Reinforcement Learning

AI, NLP, Reinforcement Learning

Using reinforcement learning for Conversation understanding

Reading Time: 3 minutes

I went to the NeurIPS 2019 conference in December and focused on NLP and reinforcement learning (RL) topics. The former is what I do for work, analyzing call center conversations, understanding what works for customer interactions, and make suggestions to clients based on their data. The latter is my personal interest, started all the way back when DeepMind beat the world’s best Go players. At the RL sessions, the tutorials mentioned using imitation learning to do natural language understanding tasks and generate responses to questions or chit-chat. People have some success, but it has some of the common pitfalls like repetitive usage of the most likely responses, and responses being too short. And sometimes the response is too simple and the bots fall into a cycle of “I don’t understand what you are saying”. So one night at a social gathering, I did get to meet Drs. David Silver and Richard Sutton. They briefly mentioned that I could try to set up the RL environment like a conversation and see if the agents can learn from the conversations. And in one of the workshops related to NLG conversations, people have talked about using various rewards to penalize for repetitiveness and encourage the generation of different texts. So that got me thinking. In addition to making a chatbot that’s similar to a call-center agent and customer interactions, I can design an environment that helps me discover the reason why people are calling. What I can do is set up categories of actions that are similar to “give refund”, “cancel service”, “keep the customer on the phone” etc, and use (regret = current action – best action), when the reward is either made the customer happy for 1 or made customer mad for 0. This may help me find the best action in a specific situation that results in the most wanted outcome given by the client. Granted, this might not give me the causal reasons why did the customer call directly, but it’s more similar to me designing a potential causal relationship graph before making the model and test if the causal relationship is correct. Even if it does not give me the exact cause, if it gives me the best action to take, then at least I have a product that does what the clients want. So the goal of the new year is to design an environment so I can test this idea and see if I can get the best type of action to take.

To get myself familiarize with the Gym framework from Open.Ai, I set up my own card game environment and made a double deep q-learning network. So here are some resources if you want to start your own:

Taking some design hints from https://github.com/zmcx16/OpenAI-Gym-Hearts
Some more help to design the environment: https://datascience.stackexchange.com/questions/28858/card-game-for-gym-reward-shaping
Other projects using gym: https://awesomeopensource.com/projects/openai-gym
Probably already programmed here: https://awesomeopensource.com/project/datamllab/rlcard
Create your own gym environments: https://github.com/openai/gym/blob/master/docs/environments.md https://stable-baselines.readthedocs.io/en/master/guide/custom_env.html https://github.com/openai/gym/blob/master/docs/creating-environments.md https://towardsdatascience.com/creating-a-custom-openai-gym-environment-for-stock-trading-be532be3910e https://medium.com/@apoddar573/making-your-own-custom-environment-in-gym-c3b65ff8cdaa

For the environment I look for in the conversation task, I will need to have an environment that mimics a conversation. A reset would result in starting a new conversation. Render is just how the conversation carried before a specific point. Each step can be a turn of the conversation, with texts randomly chosen from the same category pool. A downside I can see with this approach is that the training data might not generate the best solution to the customer service session. It may just be the bare minimum to get to the desired outcome. But I hope that the fact I can design my own regret in an RL framework, I can penalize for things like the length of the conversation or sentiment/emotional outcome, while I’m trying to achieve the outcome of retaining a customer.

I have encountered a similar situation before when I was making a chatbot using Rasa. They have a simpler RL environment, where a user can choose which route to take in a certain situation. But when I used it, the policy was too simple and does not achieve what I want, especially not give a causal relationship. I hope this could be integrated into this framework and be more useful.

AI, Reinforcement Learning

Introduction to Reinforcement Learning 2

Reading Time: < 1 minute

So continuing the journey on learning RL, I went through the deep Q blog.

https://www.freecodecamp.org/news/an-introduction-to-deep-q-learning-lets-play-doom-54d02d8017d8/

Now this time it required more setup on the colab environment. First, you got to make sure you can install the VizDoom package on colab. There are some dependencies needs to be satisfied. Also make sure you install scikit-image package. This takes about 10 minutes as the package need to be build using CMAKE.

Then I initially just pulled the stock “basic.cfg”, “basic.wad” file of the github site. But found out later that had problems. The big one being the frame was sampled in RGB and give you the wrong dimensions. So now I just download them from the github with the already setup config files. Nevertheless, it helped me to understand what was passed around.

A big thing to learn was, of course, the Deep Q network. Which is in the class “DQNetwork”. I’m more used to program in Keras, so it takes a little more time to understand. You can do this in Keras, although you need to update loss function, which is actually easier to understand to in TensorFlow.

The rest seems straightforward, just let it run and learn. Here is the full colab link.

https://colab.research.google.com/drive/1gZM4pAfH4kroa_44gNYZEE8RDVMiO9cP

AI, Reinforcement Learning

Introduction to Reinforcement Learning 1

Reading Time: < 1 minute

As I discussed before, there are not at many good reinforcement learning material out there. But I found this great set of tutorial and I will share my journey of learning it.

Start reading these two set of of blogs,

  1. Blog Number 1
  2. Blog Number 2

At the end of the second blog, you will find jupyter notebook for the Frozen Lake tutorial.

https://github.com/simoninithomas/Deep_reinforcement_learning_Course/blob/master/Q%20learning/FrozenLake/Q%20Learning%20with%20FrozenLake_unslippery%20(Deterministic%20version).ipynb

Before you start the tutorial, you will like need to learn how the Gym environment works. Go to this link and read the super basic tutorial they have there. Note especially what are the component of each episode. Actually figure out what are the possible actions, and what does each value of the state means.

http://gym.openai.com/docs/

Here is the wiki for the basic parameters. https://github.com/openai/gym/wiki/CartPole-v0

Refer to the source for what “Discrete” and “Box” are. https://github.com/openai/gym/tree/master/gym/spaces

Run the code on Google Colab and see how it runs. Print out the variable for each episode and step. I made an example in case you want to follow. https://colab.research.google.com/drive/1oqon14Iq8jzx6PhMJvja-mktFTru5GPl

Run the deterministic state first and then stochastic. Now you know how to create the super basic, Q table.

AI, Reinforcement Learning

Reinforcement learning

Reading Time: 2 minutes

Ever since the Deepmine Alpha Go paper came out, I have always been fascinated by what reinforcement learning could do. In my opinion, it’s the closest branch of AI to Artificial General Intelligence. Because no matter what your goal is, as long as you determined the reward correctly, it will eventually help you to find the “optimal” solution. What it lacks, perhaps, is that it takes a long time to learn. For example, human babies don’t have to stumble millions of times to learn how to walk. And it’s even more evident for a deer, where they pretty much know how to walk after they were born. Now, that is not too say evolution didn’t give some prebuilt model for the deer to use, but maybe we can make some pretrained model for AI to know how to walk, like the modules they insert into Neo to give him fighting abilities.

When I set out to learn reinforcement learning, there wasn’t much formalized material. There is Sutton and Barto’s book, but I found it was and still written for experts. The exercises they give are necessary for me to understand what’s going on, but there is no explanation given. There is too much “for obvious reasons” for me to understand some basic concepts, like how do I calculate state value for each grid. There is a formula for it, but no detailed walkthrough to how to assign those values. Then there are David Silver’s lectures, which shined more light for me than the book, but there is also too much detail left unexplained. So, I have searched the internet for some tutorials and finally landed on this medium article. https://medium.com/free-code-camp/an-introduction-to-reinforcement-learning-4339519de419. It has enough little details for me to work out how to get to individual values, starting from what’s a Q table and how implement it in python. I highly recommend going through the tutorial and exercises on Google Colab and make sure you understand how the environment is used in the Gym package.

Anyway, I’m still working through all of the code and understanding it. I also recommend a Lego Mindstorm set to give you a physical manifestation to work with. It can help you familiarize with how to apply it in the physical world. The basic robotics with a stable, not bipedal, robot, also makes learning easier. Make sure you use a customize python package like ev3dev so you can control the robot with python scripts. https://www.ev3dev.org/ Now that’s how you have endless, time sucking fun!