2

The Problem

I have 2 dataframes which I combine and then melt with pandas. I need to multi-plot them (as below) and the code needs to be scalable. They consist of 2 variables which form the 'key' column below ('x' and 'y' here), across multiple 'stations' (just 2 here, but needs to be scalable). I've used relplot() to be able to multi-plot the two variables on each graph, and different stations on separate graphs.

Is there any way to maintain this format but introduce a 2nd y axis to each plot? 'x' and 'y' need to be on different scales in my actual data. I've seen examples where the relplot call is stored with y = 1st variable, and a 2nd lineplot call is added for the 2nd variable with ax.twinx() included in it. So in example below, 'x' and 'y' would each have a y axis on the same graph.

How would I make that work with a melted dataframe (e.g. below) where 'key' = 2 variables and 'station' can be length n? Or is the answer to scrap that df format and start again?

Example Code

The multi-plot as it stands:

import numpy as np
np.random.seed(123)
date_range = pd.period_range('1981-01-01','1981-01-04',freq='D')
x = np.random.randint(1, 10, (4,2))
y = np.random.randint(1, 10, (4,2))
x = pd.DataFrame(x, index = date_range, columns = ['station1','station2'])
y = pd.DataFrame(y, index = date_range + pd.to_timedelta(1, unit="D"), columns = ['station1','station2'])

#keep information where each data point comes from
x["key"], y["key"] = "x", "y"
#moving index into a column 
x = x.reset_index()
y = y.reset_index()
#and changing it to datetime values that seaborn can understand
#necessary because pd.Period data is used
x["index"] = pd.to_datetime(x["index"].astype(str))
y["index"] = pd.to_datetime(y["index"].astype(str))

#combining dataframes and reshaping 
df = pd.concat([x, y]).melt(["index", "key"], var_name="station", value_name="station_value")

#plotting
fg = sns.relplot(data=df, x = "index", y = "station_value", kind = "line", hue = "key", row = "station")

#shouldn't be necessary but this example had too many ticks for the interval
from matplotlib.dates import DateFormatter, DayLocator
fg.axes[0,0].xaxis.set_major_locator(DayLocator(interval=1))
fg.axes[0,0].xaxis.set_major_formatter(DateFormatter("%y-%m-%d"))

plt.show()
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Ndharwood
  • 123
  • 3
  • 11

2 Answers2

2

You could relplot for only one key (without hue), then similar to the linked thread, loop the subplots, create a twinx, and lineplot the second key/station combo:

#plotting
fg = sns.relplot(data=df[df['key']=='x'], x="index", y="station_value", kind="line", row="station")

for station, ax in fg.axes_dict.items():  
    ax1 = ax.twinx()
    sns.lineplot(data=df[(df['key'] == 'y') & (df['station'] == station)], x='index', y='station_value', color='orange', ci=None, ax=ax1)
    ax1.set_ylabel('')

Output:

enter image description here

BigBen
  • 46,229
  • 7
  • 24
  • 40
  • 1
    Genius. I added `facet_kws={'sharey': False, 'sharex': True}` from answer below to allow the y scales to differ from site to site, as my real world data for 'x' in 'key' differs between stations. Added that and plot size arguments to the initial `relplot()` call. – Ndharwood Feb 10 '22 at 20:11
  • Just noticed that removing `'hue=..'` gets rid of the legend. Is there any way to add that back in? I have tried many ways so far, the [best](https://stackoverflow.com/questions/58931770/legend-in-for-loop-does-not-work-properly-and-just-shows-the-last-curve) of which doesn't work. – Ndharwood Feb 11 '22 at 17:21
  • I'm assuming you want all the labels in one legend, like demonstrated [here](https://stackoverflow.com/questions/5484922/secondary-axis-with-twinx-how-to-add-to-legend)? – BigBen Feb 11 '22 at 17:25
  • Yep, a single legend. Can't seem to get it to recognise more than 1 label for the legend (even outside the loop), which the methods rely on. Saving the `sns.lineplot` call as `fig` object and then calling `fig.legend` on that doesn't work, for example. – Ndharwood Feb 11 '22 at 18:36
  • 1
    It's probably easier to create the legend from scratch. Not sure how production-worthy this alternative is, but here goes. Add `label='x'` to the first `relplot`, and `label='y', legend=False` to the second `relplot`. Then `lines, labels = fg.fig.axes[0].get_legend_handles_labels()`, `lines2, labels2 = fg.fig.axes[-1].get_legend_handles_labels()`, `fg.fig.legend(lines+lines2, labels+labels2, loc='upper right')`. – BigBen Feb 11 '22 at 18:54
  • Excellent, thanks. Added `borderaxespad=3.5` to that last fig.legend call to shift my legend back inside the plot axis lines. – Ndharwood Feb 11 '22 at 19:24
  • 1
    Oh I'd just do `fg.axes[0,0].legend(lines+lines2, labels+labels2, loc='upper right')` then - add a legend to the Axes, not the figure. – BigBen Feb 11 '22 at 19:31
1

Not what you asked for, but you could make a grid of relplots with different y-axes without changing your df shape

fg = sns.relplot(
    data=df,
    x = "index",
    y = "station_value",
    kind = "line",
    col = "key",
    row = "station",
    facet_kws={'sharey': False, 'sharex': True},
)

enter image description here

mitoRibo
  • 4,468
  • 1
  • 13
  • 22
  • Still helpful, I used the `facet_kws` arg and may need to plot the vars separately like this down the line. – Ndharwood Feb 10 '22 at 20:12