3

The sample dataset df below is two variables'(import and GDP) multiple months' time series prediction results. Each is one prediction with specific hyperparameters, I'm trying to backtest the results by plotting the distribution of predicted values and real values for each month.

In order to do that, I'm considering to for loop variable_name column, plot one x (date) and multiple y values' scatter (plotting pred_value with blue color and real_value with red color points to distinguish) plot using Matplotlib.

The purpose of the plotting is: let's say I have many forecast results for each month and each variable, I want to highlight the real value and also to visually know the distribution of prediction result:

   variable_name       date   pred_value  real_value
0         import  2022/3/31  2721.795166     2736.20
1         import  2022/3/31  2721.795166     2736.20
2         import  2022/3/31  2705.501709     2736.20
3         import  2022/4/30  2795.655273     2759.98
4         import  2022/4/30  2694.454834     2759.98
5         import  2022/4/30  2655.357178     2759.98
6            GDP  2022/3/31     1.129989        1.10
7            GDP  2022/3/31     1.129989        1.10
8            GDP  2022/3/31     1.170668        1.10
9            GDP  2022/4/30     1.293045        1.30
10           GDP  2022/4/30     1.292015        1.30
11           GDP  2022/4/30     1.279539        1.30

How could I draw the proper plots to achieve that? Thanks for your help at advance.

Reference code:

import matplotlib.pyplot as plt
import numpy as np

y = [(2721.795166, 2721.795166, 2705.501709, 2736.20), (2795.655273, 2694.454834, 2655.357178, 2759.98)]
x = [1, 2]

for xe, ye in zip(x, y):
    plt.scatter([xe] * len(ye), ye)

plt.xticks([1, 2])
plt.axes().set_xticklabels(['2022/3/31', '2022/4/30'])
plt.show()

References:

pyplot: Plotting scatter plot with multiple Y values and categorical X values

Python Scatter Plot with Multiple Y values for each X

plot multiple y values against one x values in python

ah bon
  • 9,293
  • 12
  • 65
  • 148
  • 1
    (1) `df.date = pd.to_datetime(df.date)` (2) `dfm = df.melt(id_vars=['variable_name', 'date'])` (3) `p = sns.relplot(kind='scatter', data=dfm, x='date', y='value', row='variable_name', height=4, aspect=2.5, hue='variable', palette=['tab:blue', 'tab:red'], facet_kws={'sharey': False, 'sharex': True})` (4) `p.set_xticklabels(rotation=30)` [Code and Plot](https://i.stack.imgur.com/PXdkM.png). Also `import seaborn as sns` – Trenton McKinney May 29 '22 at 13:49
  • 1
    @TrentonMcKinney Many thanks for your kind help, it seems working out. Btw, do you mind to put your code as answer of this question? – ah bon May 29 '22 at 15:45
  • 1
    I closed this as a duplicate. What I posted is essentially the same as the upvoted answer, with the only difference being categories on the x-axis, instead of dates. I'm glad that works for you. Have a good day. I included each duplicate that comprises my first comment. (1) using catplot, (2) changing dates to a datetime, (3) rotating the x-axis tick labels, (4) sharing x and y. – Trenton McKinney May 29 '22 at 16:07

0 Answers0