Implementing reinforcement learning in NetLogo (Learning in multi-agent models)

Question

I am thinking to implement a learning strategy for different types of agents in my model. To be honest, I still do not know what kind of questions should I ask first or where to start.

I have two types of agents which I want them to learn by experience, they have a pool of actions which each has different reward based on specific situations that might happen. I am new to reinforcement Learning methods, therefore any suggestions on what kind of questions should I ask myself is welcomed :)

Here is how I am going forward to formulate my problem:

Agents have a lifetime and they keep track of a few things that matter for them and these indicators are different for different agents, for example, one agent wants to increase A another wants B more than A.
States are points in an agent's lifetime which they Have more than one option (I do not have a clear definition for States as they might happen a few times or not happen at all because Agents move around and they might never face a situation)
The reward is the an increase or decrease in an indicator that agents can get from an action in a specific State, and agent do not know what would be the gain if he chose another action.
The gain is not constant, the states are not well defined and there is no formal transition of one state into another,
For example agent can decide to share with one of the co-located agent (Action 1) or with all of the agents at the same location(Action 2) If certain conditions hold true Action A will be more rewarding for that agent, while in other conditions Action 2 will have higher reward; my problem is I did not see any example with unknown rewards since sharing in this scenario also depends on the other agent's characteristics (which affects the conditions of reward system) and in different states it will be different.

In my model there is no relationship between the action and the following state,and that makes me wonder if its ok to think about RL in this situation at all.

What I am looking to optimize here is the ability for my agents to reason about current situation in a better way and not only respond to their need which is triggered by their internal states. They have a few personalities which can define their long term goal and can affect their decision making in different situations, but I want them to remember what action in a situation helped them to increase their preferred long term goal.

Bryan Head · Accepted Answer · 2018-02-07T17:32:50.580

In my model there is no relationship between the action and the following state,and that makes me wonder if its ok to think about RL in this situation at all.

This seems strange. What do actions do if not change state? Note that agents don't have to necessarily know how their actions will change their state. Similarly, actions could change the state imperfectly (a robots treads could skid out so it doesn't actually move when it tries to). In fact, some algorithms are specifically designed for this uncertainty.

In any case, even if the agents are moving around the state space without having any control, it can still learn the rewards for the different states. Indeed, many RL algorithms involve moving around the state space semi-randomly to figure out what the rewards are.

I do not have a clear definition for States as they might happen a few times or not happen at all because Agents move around and they might never face a situation

You might consider expanding what goes into what you consider to be a "state". For instance, the position seems like it should definitely go into the variables identifying a state. Not all states need to have rewards (although good RL algorithms typically infer a measure of goodness of neutral states).

I would recommend clearly defining the variables that determine an agent's state. For instance, the state space could be current-patch X internal-variable-value X other-agents-present. In the simplest case, the agent can observe all of the variables that make up their state. However, there are algorithms that don't require this. An agent should always be in a state, even if the state has no reward value.

Now, concerning unknown reward. That's actually totally okay. Reward can be a random variable. In that case, a simple way to apply standard RL algorithms would be to use the expected value of the variable when making decisions. If the distribution is unknown, then the algorithm could just use the mean of the rewards observed so far.

Alternatively, you could include the variables that determine the reward in the definition of the state. That way, if the reward changes, then it is literally in a different state. For example, suppose a robot is on top of a building. It needs to get to the top of the building in front of it. If it just moves forward, it falls to ground. Thus, that state has a very low reward. However, if it first places a plank that goes from one building to the other, and then moves forward, the reward changes. To represent this, we could include plank-in-place as a variable so that putting the board in place actually changes the robot's current state and the state that would result from moving forward. Thus, the reward itself has not changed; it's just in a different state.

Hopefully this helps!

UPDATE 2/7/2018: A recent upvote reminded me of the existence of this question. In the years since it was asked, I've actually dived into RL in NetLogo to a much greater extent. In particular, I've made a python extension for NetLogo, primarily to make it easier to integrate machine learning algorithms in with model. One of the demos of the extension trains a collection of agents using deep Q-learning as the model runs.

Thank You very much, As I say I dont have a clear definition of state, my simulation is concerned with Social reciprocity interchanges such as sharing and stealing or doing nothing, but all of these actions might not be available to all agents , as some based on their internal sate usually share and others steal, however, there is a range of different actions for each act , for example they can share only with in-group people or with out-group , or they can decide to steal from out-group only, the decision made impacts on reputation and self-satisfaction of agent — Marzy, Jan 27 '14 at 01:59
Your answer helps a lot, since I am new to RL and I was not sure what kind of questions should I ask myself :D — Marzy, Jan 27 '14 at 02:02
I'm glad it helps! Concerning the availability of the actions: the available actions should be perfectly determined by the state. Remember, internal variables can be included in the state. So, if an agent is predisposed to not steal, then that predisposition is part of the state, and that action is not available. Similarly, suppose an agent is close to an out-group agent and thus can steal. That's one state. Now suppose the agent is close to an in-group agent and thus cannot steal. That's a different state. In this way, state's determine what actions an agent can take. — Bryan Head, Jan 27 '14 at 02:13

Implementing reinforcement learning in NetLogo (Learning in multi-agent models)

1 Answers1