I am running this DQN algorithm that is trying to minimize the total distance traveled by a vehicle (VRP). In the training, as you can see in the images, everything works fine. The loss is decreasing, the average length in decreasing, and the reward is increasing.
However, in the evaluation phase the model behaves in an unexpected way. I am running 100 evaluation iterations. In the first run, the results are good. But, the next runs of evaluation give me sometimes good results and sometimes very bad results. In the good results I get min total distance (min length) value of 4
, but sometimes the evaluation return a min value of 13
even though the evaluation is done on the same trained model.
So my question is this a normal behavior? And is there a way to improve these evaluation results?
P.S:
- the number of episodes in training is 4000 ( i tried on 10000 also and it's the same thing)
- the data is random array of coords and an adjacency matrix of euclidean distance between the coords. For every new episode there's a new random coords and distance arrays.
- the same thing for evaluation. I do 100 iterations of evaluation and for each iteration new random data
- In the evaluation I don't use any penalties or rewards. I only use them in the training. I am using pytorch in this project
Here's an example of the evaluation output: shortest avg length found: 5.406301895156503 (this is the value from the training) Now here are an example of 2 solutions from evaluation
Solution 1:
[0, 1, 9, 4, 2, 3, 5, 0, 6, 7, 8, 10]
length 4.955087028443813
Solution 2:
[0, 4, 9, 3, 13, 0, 7, 13, 0, 10, 0, 6, 11, 5, 12, 1, 12, 0, 2, 12, 0, 8, 0]
length 10.15813521668315
The first 100 evaluations are similar to solution 1, and i rerun evaluation for another 100 i get results similar to solution 2.