Why the pendulum has cos and sin feature? Can I just use 1 of them? Or can I use theta (the angle) instead?
I expect some explanation for this XD, intuitive or theoretical ones are all welcome.
Why the pendulum has cos and sin feature? Can I just use 1 of them? Or can I use theta (the angle) instead?
I expect some explanation for this XD, intuitive or theoretical ones are all welcome.
The angles(thetas) are passed through the sin() and cos() function so that the observations are in the range [-1,1]. This fixed range of [-1,1] helps in stabilising the training in the neural networks which has been explained well here.
You could even use one of the sin() or cos() as your observation. The reason(which I can think of) for using both sin() and cos() is probably to give more information about the state. Maybe using both sin() and cos() leads to a faster convergence.
But normalisation of the inputs is necessary. So, you cannot just use the angles as your state observations for training.
Edit: Answer to the comment by @CHEN TIANRONG
I ran DDPG with just sin() and theta_dot in one experiment and with sin(), cos() and theta_dot in another experiment. Clearly the agent never learns the task in the first experiment.
The usage of both sin() and cos() is experimental I guess.
You can find the code I used for the experiments here.
Improving the rate of convergence of a neural network for RL agents is an active area of research. You could search for algorithms which are sample efficient. For example: Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion, Sample Efficient Actor-Critic with Experience Replay, etc.