Highest Voted 'markov-decision-process' Questions

46

votes

3 answers

What is a policy in reinforcement learning?

I've seen such words as: A policy defines the learning agent's way of behaving at a given time. Roughly speaking, a policy is a mapping from perceived states of the environment to actions to be taken when in those states. But still didn't fully…

asked Sep 17 '17 at 04:52

Alexander Cyberman

2,114
3
20
21

4

votes

1 answer

Dynamic Programming of Markov Decision Process with Value Iteration

I am learning about MDP's and value iteration in self-study and I hope someone can improve my understanding. Consider the problem of a 3 sided dice having numbers 1, 2, 3. If you roll a 1 or a 2 you get that value in $ but if you roll a 3 you loose…

algorithm reinforcement-learning markov-decision-process value-iteration

asked Aug 26 '17 at 02:24

Sam Hammamy

10,819
10
56
94

3

votes

1 answer

Is MonteCarloTreeSearch an appropriate method for this problem size (large action/state space)?

I'm doing a research on a finite horizon decision problem with t=1,...,40 periods. In every time step t, the (only) agent has to chose an action a(t) ∈ A(t), while the agent is in state s(t) ∈ S(t). The chosen action a(t) in state s(t) affects the…

artificial-intelligence reinforcement-learning monte-carlo-tree-search markov-decision-process

asked Jan 09 '19 at 09:23

D. B.

49
4

3

votes

2 answers

Why do we need exploitation in RL(Q-Learning) for convergence?

I am implementing Q-learning algorithm and I observed that my Q-values are not converging to optimal Q-values even though the policy seems to be converging. I defined the action selection strategy as epsilon-greedy and epsilon is decreasing by 1/N…

reinforcement-learning q-learning convergence markov-decision-process

asked Mar 29 '18 at 02:52

Aybike

33
2

3

votes

1 answer

How to solve a deterministic MDP in a non-stationary environment

I am searching for a method to solve a Markov Decision Process (MDP). I know the transition from one state to another is deterministic, but the evironment is non-stationary. This means the reward the agent earns, can be different, when visiting the…

reinforcement-learning expert-system markov-decision-process

asked Mar 09 '18 at 12:03

Thousandsunnies

99
1
5

3

votes

2 answers

State value and state action values with policy - Bellman equation with policy

I am just getting start with deep reinforcement learning and i am trying to crasp this concept. I have this deterministic bellman equation When i implement stochastacity from the MDP then i get 2.6a My equation is this assumption correct. I saw…

equation policy reinforcement-learning mdp markov-decision-process

asked Feb 22 '18 at 17:05

Søren Koch

145
1
1
10

2

votes

1 answer

Sequential value iteration in R

I am currently reading the Dynamic Programming & MDP of Ronald Howard. Particularly in page 29 he presents the toymaker example with two different policies 1 and 2.Each policy has a transition probability matrix and a reward matrix. # Set up…

r dplyr dynamic-programming markov-decision-process

asked Mar 02 '23 at 13:40

Homer Jay Simpson

1,043
6
19

2

votes

0 answers

Value Iteration vs Policy Iteration, which one is faster?

Watching this lecture, it says that policy iteration is faster than value iteration. The reasons are: Value iteration runs in O(S^2 * A) whereas policy iteration runs in O(S^2) while computing the values, and only the extraction of the policy runs…

machine-learning time-complexity artificial-intelligence reinforcement-learning markov-decision-process

asked Jul 15 '22 at 05:32

StackExchange123

1,871
9
24

2

votes

0 answers

Coding the Variable Elimination Algorithm for action selection in multi agent MDPs

So for my master's thesis I'm trying to code the variable elimination algorithm, in this case applied to multi agent MDPs. I'm guiding myself from this example to help me go through the algorithm while I code: I don't have any problems…

python algorithm markov-decision-process

asked Jul 03 '22 at 22:20

MuchoG

63
5

2

votes

2 answers

Why does my markov chain produce identical sentences from corpus?

I am using markovify markov chain generator in python and when using the example code given there it produces a lot of duplicate sentences for me and I don't know why. The code is as follows: import markovify # Get raw text as string. with…

python markov-chains markov markov-models markov-decision-process

asked Dec 02 '21 at 15:15

Allar

85
9

2

votes

1 answer

no method matching logpdf when sampling from uniform distribution

I am trying to use reinforcement learning in julia to teach a car that is constantly being accelerated backwards (but with a positive initial velocity) to apply brakes so that it gets as close to a target distance as possible before moving…

machine-learning julia distribution reinforcement-learning markov-decision-process

asked Nov 18 '21 at 06:00

Sceptual

53
7

2

votes

1 answer

N-sided die MDP problem Value Iteration Solution Needed

I'm working on a problem for one of my classes. The problem is this: a person starts with $0 and rolls an N-sided dice (N could range from 1 to 30) and wins money according to the dice side they roll. X sides (ones) of the N-sided die result in…

algorithm reinforcement-learning markov-chains markov markov-decision-process

asked Sep 10 '21 at 14:46

biofree70

21
2

2

votes

1 answer

What is terminal state in gridworld?

I am learning markov decision process. Am I don't know where to mark terminal states. In 4x3 grid world, I marked the terminal state that I think correct(I might be wrong) with T. Pic I saw an instruction mark terminal states as…

reinforcement-learning markov markov-decision-process

asked Nov 02 '20 at 09:15

user13612530

2

votes

1 answer

MDP Policy Plot for a Maze

I have a 5x-5 maze specified as follows. r = [1 0 1 1 1 1 1 1 0 1 0 1 0 0 1 1 1 1 0 1 1 0 1 0 1]; Where 1's are the paths and 0's are the walls. Assume I have a function foo(policy_vector, r)…

python-3.x matlab matplotlib matlab-figure markov-decision-process

asked Apr 17 '18 at 13:37

DeeeeRoy

467
2
5
13

2

votes

1 answer

What do we mean by "controllable actions" in a POMDP?

I have some questions related to POMDPs. What do we mean by controllable actions in a partially observable Markov decision process? Or no controllable actions in hidden Markov states? When computing policies through value or policy iteration,…

artificial-intelligence probability reinforcement-learning expert-system markov-decision-process

asked Nov 27 '17 at 13:28

Anni Sap

23
4

Questions tagged [markov-decision-process]