How to optimize a simulation metric with deep learning without target values?

Question

I am trying to use an RNN model that outputs bus routes and its input is the demand matrix. The bus routes are then used in a simulation which spits out a metric of how the routes performed. The question is, since there is no target value of bus routes, how do I back propagate the simulation result?

To explain the question with simple python code:

"""
The model is an RNN that takes 400,24,24 matrix as input

dimension 0 represents time, dimension 1 represents departure bus stop and dimension 2 represents the arrival bus stop. Each value is a count of the number of passengers who departed at a bus stop with an arrival bus stop in mind in a specific time

output is 64,24 matrix which will be reshaped to 8,8,24
dimension 0 is the sequence index, dimension 1 is the index of bus (there are 8 buses), dimension 2 is the softmaxed classifier dimension of 24 different bus stops. From the output, 8 bus stops are picked per bus with a sequence

These sequences are then used for path generations of buses and they are evaluated from a simulation
"""

model.train()
optimizer.zero_grad()
out = model(demand)#out is 64,24 demand is 400,24,24
demand, performance = simulation(out)#assume performance as float
#here the out has grad_fn but the performance does not
loss = SOME_NUMBER - performance
loss = torch.FloatTensor(loss)
#here I need to back propagate and it is the confusing part

#simply doing loss.backward() does nothing because no grad_fn

#out.backward() requires 64,24 gradients computed somehow from 1 #metric, causes complete divergence within few steps

optimizer.step()

It is not the target though, it is just a metric of performance. The target would be the most optimal bus routes for the given demand like the true value which is not the case here. I do not know any loss function that I can solely use the metric to back propagate. — Dogukan Avci, Nov 04 '19 at 07:09
If you have a metric of performance, at least you can use that. That is, try to maximize it using a loss function based on that. Otherwise, you can look at Reinforcement Learning algos (like @JuanDavid's answer suggests). — akshayk07, Nov 04 '19 at 07:27
That is the issue, how do I back propagate the performance? The unclearness comes from the error generation by how close we are to a target vs the metric. — Dogukan Avci, Nov 04 '19 at 07:47
If maximizing the metric does the job (at least to some extent), and suppose your metric maximum value is 100 (which is true if it is a percentage), then your loss function can be `(100 - metric)` or just `-metric`. This, however, requires your metric to be differentiable (or you can approximate it to make it differentiable). Note that minimizing `-metric` i.e. negative of the metric will also work because 100 (max. value) is just a constant and won't affect the gradients (since derivative of that is zero). Also, minimizing `-metric` will be equivalent to maximizing `metric`. — akshayk07, Nov 04 '19 at 08:03
Yes I can turn the metric into an error and minimize but how do I back propagate that? — Dogukan Avci, Nov 04 '19 at 08:09
PyTorch's inbuilt automatic differentiation (calling backward on the loss) will work as long as the loss function is differentiable. — akshayk07, Nov 04 '19 at 09:25
I have tried doing that, the problem is the loss does not have any grad_fn property (because it just a tensor from data) so the weights do not get updated. Trying to back propagate without that property means there is no history of operations for the autograd to perform on. If you have a small example of loss propagation without using targets and simply feeding back a customly generated error, that would be useful (could not find any examples online). — Dogukan Avci, Nov 04 '19 at 09:46
I am not sure about your exact problem. But, if you pass the outputs of your network through the loss function, then it should get attached to the autograd graph. Perhaps you can provide some code regarding your specific problem (basically your loss function and network and your input and output dimensions), then some more help can be provided. — akshayk07, Nov 04 '19 at 13:19
Added a high level representation of the problem with python code — Dogukan Avci, Nov 04 '19 at 14:26

Juan David · Answer 1 · 2019-11-05T16:03:18.817

How does the model output represent the bus routes? Maybe you could try a reinforced learning approach. Take a look at Deep-Q Learning, It basically takes and input vector (the state of the system) and outputs an action (usually represented by an index in your output layer), then it computes the reward of that action and uses it to train the model (without the need of target values).

Here are some resources that might help you get started:

https://towardsdatascience.com/double-deep-q-networks-905dd8325412

https://arxiv.org/pdf/1802.09477.pdf

https://arxiv.org/pdf/1509.06461.pdf

Hope this was useful.

UPDATE

There is a second option, you could define a custom loss function. Generally these functions only take two arguments, the predicted_y and the target_y, in your case, there is no target_y, so you could pass a dummy target_y and not use it inside the function (I assume that you could call your simulation process inside that function, and return the metric as the "loss"). Here are examples in PyTorch and Keras.

Keras: Make a custom loss function in keras

PyTorch:PyTorch custom loss function

Thank you for your response, I know the deep reinforced learning is another way to go with the problem. I was wondering if I RNN could potentially solve it. The output is 64*24 which I interpret as 8*8*24. The last 24 are softmax results among 24 different bus stops so there is one bus stop classified in each. I get the max of those 24 as the classified bus stop. Second 8 represents which bus that bus stop belongs to and the number of buses are fixed. First 8 represents the sequence, so for each sequence a bus stop is picked for each bus. — Dogukan Avci, Nov 04 '19 at 07:44
@Dokugan I hace edited the post with some information that might help you. Good luck. — Juan David, Nov 05 '19 at 17:44

How to optimize a simulation metric with deep learning without target values?

1 Answers1