I am dealing with a deep reinforcement learning problem and the state I need to feed to my agent is contained in a vector of binary numbers.
The list looks like that:
[7.0, 1.0, 1.0, 0.0, 1.0, 5.0, 0.0, 1.0, 0.0, 1.0,
7.0, 1.0, 1.0, 0.0, 1.0, 6.0, 1.0, 0.0, 1.0, 0.0]
However, each complete state for my problem is contained every 5th iteration. Examples of complete states from the sample data are:
[[7. 1. 1. 0. 1.]]
[[5. 0. 1. 0. 1.]]
[[7. 1. 1. 0. 1.]]
[[6. 1. 0. 1. 0.]]
I have tried creating a parser function, similar to a sliding window which should capture the 5 values every 5th iteration.
def getState(data, timestep, window):
parser_start = timestep - window + 1
block = data[parser_start:timestep + 5] if parser_start >= 0 else data[0:timestep + 5] # pad with t0
res = []
for i in range(window - 1):
res.append(block[i])
return np.array([res])
to then implement into a for loop of the type:
window_size = 5
for t in range(10):
next_state = getState(data, t + 4, window_size + 1)
print(next_state)
However, when running the loop the result I get is:
[[7. 1. 1. 0. 1.]]
[[1. 1. 0. 1. 5.]]
[[1. 0. 1. 5. 0.]]
[[0. 1. 5. 0. 1.]]
[[1. 5. 0. 1. 0.]]
[[5. 0. 1. 0. 1.]]
[[0. 1. 0. 1. 7.]]
[[1. 0. 1. 7. 1.]]
[[0. 1. 7. 1. 1.]]
[[1. 7. 1. 1. 0.]]
It seems to append a sliding window of 1, rather than 5. I have been trying for weeks now but I can't find where the problem is.
Do you guys have any fresh ideas?