Reinforcement Learning - OpenCat Gym

Hi there,

in the recent years there has been a lot of progress concerning deep reinforcement learning and many publications are available that prove that machine learning can create stable gaits for robots. Mainly interesting and relatively understandable papers are for example https://arxiv.org/abs/1804.10332, where the authors created a walking and also a galloping gait based on training in simulation and a later application on the robot. In a later publication they further advanced and made it possible to learn new gaits via reinforcement directly on the robot without simulation in less than 2 hours https://arxiv.org/abs/1812.11103 There are a lot more examples and different approaches to this.

Nevertheless, this is very remarkable and this made me thinking, whether we can get there also with Nybble and Bittle.

But let's slow down a little. What's reinforcement learning in particular? When you look at the graph below you can get a basic understanding how this works: The Agent, in our case Nybble/Bittle is set in an Environment (e.g. flat floor). There it performs Actions like moving the limbs somehow and trying to get a Reward from the Programmer. The Reward is only given when the State is what we actually want, e.g. moving forward. Trapped in this loop our robot will try to maximize the Reward in every iteration becoming better and better in the movement.

Source: https://en.wikipedia.org/wiki/Reinforcement_learning#/media/File:Reinforcement_learning_diagram.svg

So what I tried to do now, is to use a simulation environment with a flat ground together with a simulation model of Nybble and wanted to make it move forward. I implemented this in the gym training environment in PyBullet with the reinforcement library Stable-Baselines3 https://stable-baselines3.readthedocs.io/en/master/index.html# . There are a lot of functional learning algorithms one can use for reinforcement learning. So for training in my case I tried an algorithm called SAC (Soft Actor-Critic) that seems to be the current state-of-the-art algorithm for reinforcement learning and applied it on Nybble to see how it performs. And the results is definitely still more a crawling than a walking gait, but it shows the potential.

This is a result of reinforcement training only without any intervention from my side:

The next steps are to improve training and the resulting gaits. And once the gaits are good in simulation there are two ways. One is trying to get the learning policy running on Nybble/Bittle or learn it directly on them. I think there I have to use an additional set of hardware to make it run.

If you want to train a walking gait, you can find the link to my repository below, where I will provide further updates. Make sure to install all the necessary python libraries in the import section of the code.

https://github.com/ger01d/opencat-gym

35 comments

35 Comments

Commenting on this post isn't available anymore. Contact the site owner for more info.

Amit Banerjee

Jan 06

I wrote a detailed article mentioning some more details and workarounds for some of the issues I faced replicating this. https://www.learnwitharobot.com/p/training-a-reinforcement-learning

Amit Banerjee

Jan 07

Replying to

Thanks, but I think my Bittle is installed the same way. The video likely looks a bit different because the limbs didn't move properly as they were controlled by RL model.

Rongzhong Li

Jan 08

Replying to

The bulging curve on the shank was designed to resemble the muscle. So in terms of bionics, the muscle should be on the side where its shrinking will help support the major weight. On Bittle, it should be on the bottom side to look more natural.

Amit Banerjee

Jan 08

Replying to

Thanks! Now I get what you are saying. I didn't even realize that the two sides of the legs looked differently.

Amit Banerjee

Jan 05

Thanks for providing the code and references.

I was able to train a model and try it out on Bittle. However my trained model doesn't deliver the same smooth gait compared to the pre-existing one in the repository ( trained_agent_PPO.zip) I trained using the exact same code as in the Jupyter notebook at https://github.com/ger01d/opencat-gym/blob/main/opencat-gym.ipynb. I also trained with 5 million steps instead of 2 million in the repo. The result is much poorer tha what trained_agent_PPO gives.

I am wondering if trained_agent_PPO was created using some different settings such as a different reward function? Btw, greta job with open sourcing the code. It provides a great example to learn RL.

Servo Moves Erratically

Bittle X ESP32 Build Setup

Servos going crazy during Calibration mode

Reinforcement Learning - OpenCat Gym