I am thinking to implement a learning strategy for different types of agents in my model. To be honest, I still do not know what kind of questions should I ask first or where to start.
I have two types of agents which I want them to learn by experience, they have a pool of actions which each has different reward based on specific situations that might happen. I am new to reinforcement Learning methods, therefore any suggestions on what kind of questions should I ask myself is welcomed :)
Here is how I am going forward to formulate my problem:
- Agents have a lifetime and they keep track of a few things that matter for them and these indicators are different for different agents, for example, one agent wants to increase A another wants B more than A.
- States are points in an agent's lifetime which they Have more than one option (I do not have a clear definition for States as they might happen a few times or not happen at all because Agents move around and they might never face a situation)
- The reward is the an increase or decrease in an indicator that agents can get from an action in a specific State, and agent do not know what would be the gain if he chose another action.
- The gain is not constant, the states are not well defined and there is no formal transition of one state into another,
- For example agent can decide to share with one of the co-located agent (Action 1) or with all of the agents at the same location(Action 2) If certain conditions hold true Action A will be more rewarding for that agent, while in other conditions Action 2 will have higher reward; my problem is I did not see any example with unknown rewards since sharing in this scenario also depends on the other agent's characteristics (which affects the conditions of reward system) and in different states it will be different.
In my model there is no relationship between the action and the following state,and that makes me wonder if its ok to think about RL in this situation at all.
What I am looking to optimize here is the ability for my agents to reason about current situation in a better way and not only respond to their need which is triggered by their internal states. They have a few personalities which can define their long term goal and can affect their decision making in different situations, but I want them to remember what action in a situation helped them to increase their preferred long term goal.