The sample dataset df
below is two variables'(import and GDP) multiple months' time series prediction results. Each is one prediction with specific hyperparameters, I'm trying to backtest the results by plotting the distribution of predicted values and real values for each month.
In order to do that, I'm considering to for loop variable_name
column, plot one x
(date) and multiple y
values' scatter (plotting pred_value
with blue color and real_value
with red color points to distinguish) plot using Matplotlib.
The purpose of the plotting is: let's say I have many forecast results for each month and each variable, I want to highlight the real value and also to visually know the distribution of prediction result:
variable_name date pred_value real_value
0 import 2022/3/31 2721.795166 2736.20
1 import 2022/3/31 2721.795166 2736.20
2 import 2022/3/31 2705.501709 2736.20
3 import 2022/4/30 2795.655273 2759.98
4 import 2022/4/30 2694.454834 2759.98
5 import 2022/4/30 2655.357178 2759.98
6 GDP 2022/3/31 1.129989 1.10
7 GDP 2022/3/31 1.129989 1.10
8 GDP 2022/3/31 1.170668 1.10
9 GDP 2022/4/30 1.293045 1.30
10 GDP 2022/4/30 1.292015 1.30
11 GDP 2022/4/30 1.279539 1.30
How could I draw the proper plots to achieve that? Thanks for your help at advance.
Reference code:
import matplotlib.pyplot as plt
import numpy as np
y = [(2721.795166, 2721.795166, 2705.501709, 2736.20), (2795.655273, 2694.454834, 2655.357178, 2759.98)]
x = [1, 2]
for xe, ye in zip(x, y):
plt.scatter([xe] * len(ye), ye)
plt.xticks([1, 2])
plt.axes().set_xticklabels(['2022/3/31', '2022/4/30'])
plt.show()
References:
pyplot: Plotting scatter plot with multiple Y values and categorical X values