Questions tagged [markov-decision-process]
51 questions
46
votes
3 answers
What is a policy in reinforcement learning?
I've seen such words as:
A policy defines the learning agent's way of behaving at a given time. Roughly
speaking, a policy is a mapping from perceived states of the environment to actions to be taken when in those states.
But still didn't fully…

Alexander Cyberman
- 2,114
- 3
- 20
- 21
4
votes
1 answer
Dynamic Programming of Markov Decision Process with Value Iteration
I am learning about MDP's and value iteration in self-study and I hope someone can improve my understanding.
Consider the problem of a 3 sided dice having numbers 1, 2, 3. If you roll a 1 or a 2 you get that value in $ but if you roll a 3 you loose…

Sam Hammamy
- 10,819
- 10
- 56
- 94
3
votes
1 answer
Is MonteCarloTreeSearch an appropriate method for this problem size (large action/state space)?
I'm doing a research on a finite horizon decision problem with t=1,...,40 periods. In every time step t, the (only) agent has to chose an action a(t) ∈ A(t), while the agent is in state s(t) ∈ S(t). The chosen action a(t) in state s(t) affects the…

D. B.
- 49
- 4
3
votes
2 answers
Why do we need exploitation in RL(Q-Learning) for convergence?
I am implementing Q-learning algorithm and I observed that my Q-values are not converging to optimal Q-values even though the policy seems to be converging. I defined the action selection strategy as epsilon-greedy and epsilon is decreasing by 1/N…

Aybike
- 33
- 2
3
votes
1 answer
How to solve a deterministic MDP in a non-stationary environment
I am searching for a method to solve a Markov Decision Process (MDP). I know the transition from one state to another is deterministic, but the evironment is non-stationary. This means the reward the agent earns, can be different, when visiting the…

Thousandsunnies
- 99
- 1
- 5
3
votes
2 answers
State value and state action values with policy - Bellman equation with policy
I am just getting start with deep reinforcement learning and i am trying to crasp this concept.
I have this deterministic bellman equation
When i implement stochastacity from the MDP then i get 2.6a
My equation is this assumption correct. I saw…

Søren Koch
- 145
- 1
- 1
- 10
2
votes
1 answer
Sequential value iteration in R
I am currently reading the Dynamic Programming & MDP of Ronald Howard.
Particularly in page 29 he presents the toymaker example with two different policies 1 and 2.Each policy has a transition probability matrix and a reward matrix.
# Set up…

Homer Jay Simpson
- 1,043
- 6
- 19
2
votes
0 answers
Value Iteration vs Policy Iteration, which one is faster?
Watching this lecture, it says that policy iteration is faster than value iteration.
The reasons are:
Value iteration runs in O(S^2 * A) whereas policy iteration runs in O(S^2) while computing the values, and only the extraction of the policy runs…

StackExchange123
- 1,871
- 9
- 24
2
votes
0 answers
Coding the Variable Elimination Algorithm for action selection in multi agent MDPs
So for my master's thesis I'm trying to code the variable elimination algorithm, in this case applied to multi agent MDPs. I'm guiding myself from this example to help me go through the algorithm while I code:
I don't have any problems…

MuchoG
- 63
- 5
2
votes
2 answers
Why does my markov chain produce identical sentences from corpus?
I am using markovify markov chain generator in python and when using the example code given there it produces a lot of duplicate sentences for me and I don't know why.
The code is as follows:
import markovify
# Get raw text as string.
with…

Allar
- 85
- 9
2
votes
1 answer
no method matching logpdf when sampling from uniform distribution
I am trying to use reinforcement learning in julia to teach a car that is constantly being accelerated backwards (but with a positive initial velocity) to apply brakes so that it gets as close to a target distance as possible before moving…

Sceptual
- 53
- 7
2
votes
1 answer
N-sided die MDP problem Value Iteration Solution Needed
I'm working on a problem for one of my classes. The problem is this: a person starts with $0 and rolls an N-sided dice (N could range from 1 to 30) and wins money according to the dice side they roll. X sides (ones) of the N-sided die result in…

biofree70
- 21
- 2
2
votes
1 answer
What is terminal state in gridworld?
I am learning markov decision process.
Am I don't know where to mark terminal states.
In 4x3 grid world, I marked the terminal state that I think correct(I might be wrong) with T.
Pic
I saw an instruction mark terminal states as…
user13612530
2
votes
1 answer
MDP Policy Plot for a Maze
I have a 5x-5 maze specified as follows.
r = [1 0 1 1 1
1 1 1 0 1
0 1 0 0 1
1 1 1 0 1
1 0 1 0 1];
Where 1's are the paths and 0's are the walls.
Assume I have a function foo(policy_vector, r)…

DeeeeRoy
- 467
- 2
- 5
- 13
2
votes
1 answer
What do we mean by "controllable actions" in a POMDP?
I have some questions related to POMDPs.
What do we mean by controllable actions in a partially observable Markov decision process? Or no controllable actions in hidden Markov states?
When computing policies through value or policy iteration,…

Anni Sap
- 23
- 4