Reinforcement Learning solution for Flappy Bird with PPO algorithm

Question

The quick summary of my question: I'm trying to solve a clone of the Flappy Bird game found on the internet with the Reinforcement Learning algorithm Proximal Policy Optimization. Apparently, I've faced an issue with designing the reward system. How can I specify a reward for the agent given that it's a third party game so it does not return anything to me and the only info I get is visual information form the window?

Some details and the background: Prior to trying to solve a third party game I've played with several google gym environments such as Cart-Pole, Mountain Car, Lunar Lander and recently Car Racing. To solve them I used PG, DQN, Actor-Critic and PPO algorithms. After understanding how to work with problems when the state is an image I've decided to take on a new challenge and try to get out of the sandbox (gym).

I've picked Flappy Bird because it's simple in concept, action space is 1 (actually 2) and it's notoriously hard for humans. My code can be found here: https://github.com/Mike-Kom/Flappy-Bird-PPO Agent class and buffer was tested on the Car Racing so there shouldn't be any issues with the RL algorithm. The neural net was changed a little due to a different state size but conceptually it's the same so there should not be any problems ether. My current guess is that the reward system is not robust and causes the agent not to learn properly. Currently I'm just giving the agent 0,025 points each step and 2 points after the 25th frame and above (I've found that this is exactly the frame at which the agent passes between the first two pipes.) but it does not seems to work. Any suggestions on how to solve an external environment and especially on how to design the reward system are welcome!

Sorry if the code is messy and not professional it was originally meant to be just for me :) Programing is just my hobby and my occupation is far from code writing. Moreover, this is my first question here and I wanted to take an opportunity and thank all of you for writing our answers and suggestions for different question! You make this community super helpful for so many people! Even though, I did not write a question before I found here a tone of answers and good suggestions :)

if you don't have feedbacks from the env, you don't train on that env... you code a basic program to simulate it, and then you feed informations from the original one... for example, you can generate a "segmentation" image, train the NN on the segmented image, and in the meantime, train another NN to segment the images from the original game. In this way, the evaluation will be: 1 first NN from game to segmentation, 2 second NN from segmentation to whatever Reinforcement learning you are using (if you want the NN for the action might take something different as input instead of the image) — Alberto Sinigaglia, Sep 01 '22 at 17:47

Reinforcement Learning solution for Flappy Bird with PPO algorithm

0 Answers0