I'm attempting to pose a problem as a reinforcement learning problem. My difficulty is that the state which an agent is in changes randomly. They must simply choose an action within the state they are in. I want to learn appropriate actions for all states based on the reward they receive for performing actions.
Question:
Is this a specific type of RL problem? If there is no successor state, so how would one calculate the value of a state?