Highest Voted 'policy-gradient-descent' Questions

9

votes

3 answers

What Loss Or Reward Is Backpropagated In Policy Gradients For Reinforcement Learning?

I have made a small script in Python to solve various Gym environments with policy gradients. import gym, os import numpy as np #create environment env = gym.make('Cartpole-v0') env.reset() s_size = len(env.reset()) a_size = 2 #import my neural…

asked Aug 26 '20 at 16:50

S2673

269
4
15

7

votes

0 answers

Why does my agent always takes a same action in DQN - Reinforcement Learning

I have trained an RL agent using DQN algorithm. After 20000 episodes my rewards are converged. Now when I test this agent, the agent is always taking the same action , irrespective of state. I find this very weird. Can someone help me with this. Is…

reinforcement-learning q-learning policy-gradient-descent

asked Oct 09 '19 at 04:35

chink

1,505
3
28
70

6

votes

1 answer

PyTorch PPO implementation for Cartpole-v0 getting stuck in local optima

I have implemented PPO for Cartpole-VO environment. However, it does not converge in certain iterations of the game. Sometimes it gets stuck in local optima. I have implemented the algorithm using the TD-0 advantage i.e. A(s_t) = R(t+1) + \gamma…

python machine-learning pytorch reinforcement-learning policy-gradient-descent

asked Dec 01 '21 at 20:48

204

433
1
5
19

4

votes

1 answer

DDPG not converging for a simple control problem

I am trying to solve a control problem with DDPG. The problem is simple enough so that I can do value function iteration for its discretized version, and thus I have the "perfect" solution to compare my results with. But I want to solve the problem…

deep-learning reinforcement-learning q-learning policy-gradient-descent

asked Jan 31 '21 at 22:13

Hypsoline

49
1
6

3

votes

0 answers

REINFORCE for Cartpole: Training Unstable

I am implementing REINFORCE for Cartpole-V0. However, the training process is very unstable. I have not implemented `early-stopping' for the environment and allow training to continue for a fixed (high) number of episodes. After a few thousand…

pytorch reinforcement-learning openai-gym policy-gradient-descent

asked Nov 29 '21 at 11:55

204

433
1
5
19

3

votes

1 answer

Ray - RLlib - Error with Custom env - continuous action space - DDPG - offline experience training?

Error while using offline experiences for DDPG. custom environment dimensions (action space and state space) seem to be inconsistent with what is expected in DDPG RLLIB trainer. Ubuntu, Ray 0.7 version (latest ray), DDPG example, offline dataset.…

offline reinforcement-learning ray policy-gradient-descent

asked Apr 18 '19 at 06:09

narasimha.m

61
5

3

votes

0 answers

Policy gradient in keras predicts only one action

I have trouble with the REINFORCE algorithm in keras with Atari games. After round about 30 episodes the network converges to one action. But the same algorithm is working with CartPole-v1 and converges with mean reward 495,0 after round 350…

python keras reinforcement-learning policy-gradient-descent

asked Mar 29 '19 at 15:01

tk338

176
4
11

2

votes

1 answer

Why `ep_rew_mean` much larger than the reward evaluated by the `evaluate_policy()` fuction

I write a custom gym environment, and trained with PPO provided by stable-baselines3. The ep_rew_mean recorded by tensorboard is as follow: the ep_rew_mean curve for total 100 million steps, each episode has 50 steps As shown in the figure, the…

reinforcement-learning stable-baselines policy-gradient-descent

asked Feb 06 '23 at 08:40

Aramiis

21
2

2

votes

2 answers

How to solve the zero probability problem in the policy gradient?

Recently, I have tried to apply the naive policy gradient method to my problem. However, I found that the difference between different outputs of the last layer of the neural network is huge, which means that after applying the softmax layer, only…

reinforcement-learning policy-gradient-descent

asked Nov 02 '20 at 17:00

HZ-VUW

842
9
20

2

votes

1 answer

What are Target Network in Policy Gradient algorithms in Reinforcement learning in simple terms with some example?

How does it differ from regular network Source Text --> "In DDPG algorithm topology consist of two copies of network weights for each network, (Actor: regular and target) and (Critic: regular and target)"

reinforcement-learning policy-gradient-descent

asked Jan 24 '20 at 06:04

keshav thosar

27
4

2

votes

1 answer

Can the output of DDPG policy network be a probability distribution instead of a certain action value?

We know that DDPG is a deterministic policy gradient method and the output of its policy network should be a certain action. But once I tried to let the output of policy network be a probability distribution of several actions, which means the…

reinforcement-learning policy-gradient-descent

asked Dec 22 '19 at 10:58

JinZ

21
1

2

votes

1 answer

How to accumulate my loss over mini batches then calculate my gradient

My main question is; is averaging the loss the same thing as averaging the gradient and how do i accumulate my loss over mini batches then calculate my gradient? I have been trying to implement policy gradient in Tensorflow and run into the issue…

python tensorflow reinforcement-learning tensorflow-gradient policy-gradient-descent

asked Mar 17 '19 at 16:59

Mike Jankowiak

29
3

2

votes

1 answer

Reward function for Policy Gradient Descent in Reinforcement Learning

I'm currently learning about Policy Gradient Descent in the context of Reinforcement Learning. TL;DR, my question is: "What are the constraints on the reward function (in theory and practice) and what would be a good reward function for the case…

reinforcement-learning policy-gradient-descent

asked Jun 29 '18 at 00:29

Carsten

4,204
4
32
49

1

vote

0 answers

DDPG always choosing the boundaries actions

Iam trying to implement DDPG algorithm that take a state of 8 values and output action of size=4. The actions are lower bounded by [5,5,0,0] and upper bounded by [40,40,15,15]. When I train my DDPG it always choose one of the boundaries for example…

python pytorch reinforcement-learning gradient-descent policy-gradient-descent

asked May 20 '22 at 10:00

Mohammad Bazzal

11
1

1

vote

0 answers

How to sample actions for a multi-dimensional continuous action space for REINFORCE algorithm

So, the problem that I am working on can be summarised like this: The observation space is an 8x1 vector and all are continuous values. Some of them are in the range [-inf, inf] and some are [-360, 360]. The action space is a 4x1 vector. All the…

python pytorch reinforcement-learning policy-gradient-descent

asked Oct 14 '21 at 20:51

Rizwan Malik

11
2

Questions tagged [policy-gradient-descent]