1

I have 2 timeseries dataframes which were sourced from two 2-dimensional arrays. The structure of these dataframes is:

Generate example dataframes

import pandas as pd
import numpy as np
date_range = pd.period_range('1981-01-01','1981-01-04',freq='D')
x = np.arange(8).reshape((4,2))
y = np.arange(8).reshape((4,2))
x = pd.DataFrame(x, index = date_range, columns = ['station1','station2'])
y = pd.DataFrame(y, index = date_range, columns = ['station1','station2'])
print(x)
          station1 station2
1981-01-01  0        1
1981-01-02  2        3
1981-01-03  4        5
1981-01-04  6        7

The Aim

I want to generate a multi-plot where the values of 'x' and 'y' are plotted as lines on the same graphs, with x and y split by colour, but have multiple 'rows' of graphs of each station. Using above example code, each separate graph would be plotting the different station columns.

What I've tried

I tried the seaborn route: first concatenating the two dataframes together - each df represents a variable, so I added them as keys to name these variables after concatenation. Then I used melt to be able to multi-plot them:

df = pd.concat([x , y], keys = ['Var1', 'Var2'])
meltdf = df.melt(var_name = 'Station', value_name = 'Value', ignore_index = False)
print(meltdf)
                  Station  Value
Var1 1981-01-01  station1      0
     1981-01-02  station1      2
     1981-01-03  station1      4
     1981-01-04  station1      6
Var2 1981-01-01  station1      0
     1981-01-02  station1      2
     1981-01-03  station1      4
     1981-01-04  station1      6
Var1 1981-01-01  station2      1
     1981-01-02  station2      3
     1981-01-03  station2      5
     1981-01-04  station2      7
Var2 1981-01-01  station2      1
     1981-01-02  station2      3
     1981-01-03  station2      5
     1981-01-04  station2      7

I want to plot the values of Var1 & Var2 as lines on the same graph for station1, same for station2 and so on. I want to keep the date as index as these should be timeseries plots, with 'date' along the x axis. I tried this non-working code (for example):

import seaborn as sns
sns.relplot(data=df, x = 'Var1', y = 'Var2', kind = 'line', hue = 'keys', row = 'Station')

Should I 'double melt' the dfs instead to have variable type as its own col? The concat + keys step seems incorrect.

Ndharwood
  • 123
  • 3
  • 11

1 Answers1

1

You were on the right track with pd.concat and pd.melt, followed by seaborn relplot. I would approach it like this:

import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt

#data generation
import numpy as np
np.random.seed(123)
date_range = pd.period_range('1981-01-01','1981-01-04',freq='D')
x = np.random.randint(1, 10, (4,2))
y = np.random.randint(1, 10, (4,2))
x = pd.DataFrame(x, index = date_range, columns = ['station1','station2'])
y = pd.DataFrame(y, index = date_range + pd.to_timedelta(1, unit="D"), columns = ['station1','station2'])

#keep information where each data point comes from
x["key"], y["key"] = "x", "y"

#combining dataframes and reshaping 
df = pd.concat([x, y]).melt(["key"], var_name="station", value_name="station_value",  ignore_index = False)

#plotting - the datetime conversion might not be necessary 
#depending on the datetime format of your original dataframes
#best approach is conversion to datetime index when creating the dataframes
fg = sns.relplot(data=df, x = pd.to_datetime(df.index.to_timestamp()), y = "station_value", kind = "line", hue = "key", row = "station")

#shouldn't be necessary but this example had too many ticks for the interval
from matplotlib.dates import DateFormatter, DayLocator
fg.axes[0,0].xaxis.set_major_locator(DayLocator(interval=1))
fg.axes[0,0].xaxis.set_major_formatter(DateFormatter("%y-%m-%d"))

plt.show()

Sample output: enter image description here

If the pandas version cannot handle duplicate index entries, we can rewrite it to:

import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt

#data generation
import numpy as np
np.random.seed(123)
date_range = pd.period_range('1981-01-01','1981-01-04',freq='D')
x = np.random.randint(1, 10, (4,2))
y = np.random.randint(1, 10, (4,2))
x = pd.DataFrame(x, index = date_range, columns = ['station1','station2'])
y = pd.DataFrame(y, index = date_range + pd.to_timedelta(1, unit="D"), columns = ['station1','station2'])

#keep information where each data point comes from
x["key"], y["key"] = "x", "y"
#moving index into a column 
x = x.reset_index()
y = y.reset_index()
#and changing it to datetime values that seaborn can understand
#only necessary because your example contains pd.Period data
x["index"] = pd.to_datetime(x["index"].astype(str))
y["index"] = pd.to_datetime(y["index"].astype(str))

#combining dataframes and reshaping 
df = pd.concat([x, y]).melt(["index", "key"], var_name="station", value_name="station_value")

#plotting
fg = sns.relplot(data=df, x = "index", y = "station_value", kind = "line", hue = "key", row = "station")

#shouldn't be necessary but this example had too many ticks for the interval
from matplotlib.dates import DateFormatter, DayLocator
fg.axes[0,0].xaxis.set_major_locator(DayLocator(interval=1))
fg.axes[0,0].xaxis.set_major_formatter(DateFormatter("%y-%m-%d"))

plt.show()
Mr. T
  • 11,960
  • 10
  • 32
  • 54
  • That's weird - I get a ValueError: `cannot reindex from a duplicate axis` trying it this way – Ndharwood Feb 05 '22 at 15:56
  • With the example code or with your real data? I assume `df = pd.concat([x, y])` is the problem here. Does the traceback give more information about the underlying problem? Here: matplotlib 3.5.1, seaborn 0.11.0, NumPy 1.22.1, pandas 1.3.5 – Mr. T Feb 05 '22 at 16:17
  • Both example and real data for me. Lots of 'reindex' errors from Pandas. I'm trying to find different ways to add the date index, as [this](https://stackoverflow.com/questions/27236275/what-does-valueerror-cannot-reindex-from-a-duplicate-axis-mean) makes it seem like I have an index with duplicate values. seaborn 0.11.2 pandas 1.3.3 – Ndharwood Feb 05 '22 at 16:44
  • 1
    Yes, the index contains duplicate values. I know that in the olden days this posed a problem for pandas. Unlikely, though they resolved this issue from 1.3.3 to 1.3.5. Well, then we have to move the index to a column. – Mr. T Feb 05 '22 at 16:48
  • 1
    **Pandas Version does matter!** Interestingly - this didn't plot, so I upgraded from 1.3.3 to 1.4.0 and it plotted. Many thanks @Mr. T - the index resetting seems convoluted, but the odd use-case dictates that. – Ndharwood Feb 05 '22 at 18:11