How to solve a deterministic MDP in a non-stationary environment

Question

I am searching for a method to solve a Markov Decision Process (MDP). I know the transition from one state to another is deterministic, but the evironment is non-stationary. This means the reward the agent earns, can be different, when visiting the same state again. Is there an algorithm, like Q-Learning or SARSA, I can use for my problem?

score 2 · Accepted Answer · answered Mar 09 '18 at 16:14

2

In theory, this will be a very difficult problem. That is, it will be very difficult to find an algorithm with theoretical proofs of convergence to any (optimal) solution.

In practice, any standard RL algorithm (like those you named) may be just fine, as long as it's not "too non-stationary". With that I mean, it will likely be fine in practice if your environment doesn't change too rapidly/suddenly/often. You may wish to use a slightly higher exploration rate and/or a higher learning rate than you would in a stationary setting, because you need to be able to keep learning and more recent experiences will be more informative than older experiences.

answered Mar 09 '18 at 16:14

Dennis Soemers

8,090
2
32
55

But where is the line between enough stationary and Not enough stationary. What is exactly too often? Every timestep, of every fifth or tenth timestep? I think it all depends on the Design of my MDP. So I want to make sure, that I will find an optimal Solution. And I want to be able to explain, why My Definition of States and actions are like they are. What about the deterministic Part? Does it have any influence on the Solution? – Thousandsunnies Mar 11 '18 at 10:05
@Thousandsunnies It's impossible to tell where the line is really. I was already specifically talking about in practice / empirically, not theory. If you're talking about the general RL setting (which I assmed due to mention of Q-learning / SARSA), where the MDP's properties (like transition matrix) are not known, and you can only learn from experience, there's little more to say theoretically. If you actually do know all the properties of the MDP, that might change things, but then we'll need to know all the precise, formal details. – Dennis Soemers Mar 11 '18 at 10:36

How to solve a deterministic MDP in a non-stationary environment

1 Answers1