Questions tagged [value-iteration]

19 questions
140
votes
5 answers

What is the difference between value iteration and policy iteration?

In reinforcement learning, what is the difference between policy iteration and value iteration? As much as I understand, in value iteration, you use the Bellman equation to solve for the optimal policy, whereas, in policy iteration, you randomly…
4
votes
1 answer

Is Monte Carlo learning policy or value iteration (or something else)?

I am taking a Reinforcement Learning class and I didn’t understand how to combine the concepts of policy iteration/value iteration with Monte Carlo (and also TD/SARSA/Q-learning). In the table below, how can the empty cells be filled: Should/can it…
4
votes
1 answer

Dynamic Programming of Markov Decision Process with Value Iteration

I am learning about MDP's and value iteration in self-study and I hope someone can improve my understanding. Consider the problem of a 3 sided dice having numbers 1, 2, 3. If you roll a 1 or a 2 you get that value in $ but if you roll a 3 you loose…
2
votes
3 answers

In a df with multiple observations for each ID, how to conditionally find date according to another variable?

This is the first question I ask on here, I hope to do this correctly! I have a dataset with million of observations. Each row is a drug prescription picked up by different individuals on different dates, with each individual appearing multiple…
2
votes
1 answer

Is there a clever way to get rid of these loops using numpy?

I'm reaching the maximum recursion depth and I've been trying to use np.tensordot() I couldn't really get an insight into how to use it in this case. def stopping_condtion(a,V,V_old,eps): return np.max(la.norm(V - V_old)) < ((1 - a) * eps) /…
Max
  • 437
  • 1
  • 4
  • 7
2
votes
1 answer

Why is Policy Iteration faster than Value Iteration?

We know that policy iteration gives us the policy directly and hence is faster. But can anyone explain it with some examples.
shmi
  • 23
  • 5
2
votes
2 answers

How to Solve reinforcement learning Grid world examples using value iteration?

I find either theories or python example which is not satisfactory as a beginner. I just need to understand a simple example for understanding the step by step iterations. Could anyone please show me the 1st and 2nd iterations for the Image that I…
Ahasan Ratul
  • 35
  • 1
  • 10
2
votes
0 answers

Modelling profitability of credit card by Markov Decision Process.

This is with reference to a paper published on Modelling the profitability of credit cards by Markov Decision processed.I am trying to implement the same in python using Mdptoolbox but not getting the output in the format expected. My states are the…
1
vote
2 answers

Population growth math issue in c

I have looked this over and am wondering where my math issue is. I believe that it should be calculating correctly, but the floats do not round up, .75 to 1 to add to the count for births/deaths. I am a novice to c. Here is the code I have so…
Lee
  • 11
  • 3
1
vote
1 answer

why are policy-iteration and value-iteration methods giving different results for optimal values and optimal policy?

I am currently studying dynamic programming in reinforcement learning in which I came across two concepts Value-Iteration and Policy-Iteration. To understand the same, I am implementing the gridworld example from the Sutton which says : The…
1
vote
0 answers

Faster accessing 2D numpy/array or Large 1D numpy/array

I am performing prioritized sweeping for which I have a matrix which has 1000*1000 cells (gridworld) whose cells I have to access repeatedly in a while true loop for assignment (I am not essentially iterating over the list but all cells are accessed…
SH_V95
  • 161
  • 1
  • 3
  • 11
0
votes
0 answers

How to iterate through a nested dictionary to find a specific value given a list whose elements are keys?

I'm trying to write a function that accepts two parameters - a nested dictionary and a list whose elements are a subset of the keys (missing keys aren't considered). Using the list's elements, the function has to iterate over the dictionary to find…
0
votes
0 answers

Plotting a determinant of a 2by2 matrix with respect to independent variable in matlab

Th`anks in advance. I'm plotting a figure in Matlab but I'm facing difficulty, my code is: clear clc n2=1.45:1.5; x=0:100;k2z=6.28.*n2./x;a=0.01;u1=1;u2=1;k1z=6.28./x; for j=1:numel(k1z) for i=1:numel(k2z) …
0
votes
0 answers

How to make my for loop work in openpyxl?

Hi I am in process of writing a python code to input search and edit data into excel. my chosen format for the UI is PySimpleGUI and my xlsx package is openpyxl. I have been coding for approximately 3 weeks. I probably bit off more than I can chew…
0
votes
2 answers

Declare a javascript object between brackets to choose only the element corresponding to its index

I found this sample in a book and this is the first time that I see this notation. Obviously it's a thousand times shorter than making a switch; but what is it? When I do typeof(status) it returns undefined. I would like to understand what it is so…
1
2