2

I would like to create a plot using a pandas timeseries in one subplot and a rectangle in another subplot.

If I don't include the subplots, I can achieve this pretty easily:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.patches as mpatches

N = 100
np.random.seed(N)
dates = pd.date_range(start='2018-01-01', periods=N, freq='D')
one_third_delta = (dates[-1] - dates[0])/3
one_third_stamp = dates[0] + one_third_delta
ts = pd.Series(index=dates, data=np.random.randn(N))

def add_rectangle(ax, x, y, width, height, **kwargs):
    ax.add_patch(mpatches.Rectangle(
        (x, y),
        width,
        height,
        **kwargs
    ))

args = [one_third_stamp, -1, one_third_delta, 2]

kwargs = {
    'facecolor': 'orange',
    'edgecolor': 'None',
    'alpha': 0.5,
}

# Plot 1: 1 subplot with ts plotted first (Working)
fig, ax = plt.subplots()
ts.plot(ax=ax)
add_rectangle(ax, *args, **kwargs)
plt.savefig('plot1.png')
plt.close(fig)

Plot 1

However, things already start to get weird when I try adding the rectangle first:

# Plot 2: 1 subplot with ts plotted second (Not Working)
fig, ax = plt.subplots()
add_rectangle(ax, *args, **kwargs)
ts.plot(ax=ax)
plt.savefig('plot2.png')
plt.close(fig)

Plot 2

If I try splitting out the two plots, neither approach works:

# Plot 3: 2 subplots with ts plotted first (Not Working)
fig, axes = plt.subplots(2, sharex=True)
ts.plot(ax=axes[1])
add_rectangle(axes[0], *args, **kwargs)
plt.savefig('plot3.png')
plt.close(fig)

# Plot 4: 2 subplots with ts plotted second (Not Working)
fig, axes = plt.subplots(2, sharex=True)
add_rectangle(axes[0], *args, **kwargs)
ts.plot(ax=axes[1])
plt.savefig('plot4.png')
plt.close(fig)

Plot 3

Plot 4

I've found two work-arounds.

The first involves casting everything to a float with matplotlib.dates.date2num:

# Plot 5: 2 subplots with date2num (Working)
two_thirds_stamp = one_third_stamp + one_third_delta
args_date2num = [
    mdates.date2num(one_third_stamp),
    -1,
    mdates.date2num(two_thirds_stamp) - mdates.date2num(one_third_stamp),
    2,
]
df = ts.to_frame().reset_index()
df.columns = ['date', 'value']
df['num'] = df.date.apply(mdates.date2num)
fig, axes = plt.subplots(2, sharex=True)
add_rectangle(axes[0], *args_date2num, **kwargs)
axes[1].plot_date(df.num, df.value, ls='-', marker=None)
axes[0].set_ylim(axes[1].get_ylim())
plt.savefig('plot5.png')
plt.close(fig)

Plot 5

This isn't great for two reasons:

  1. I lose the nice ticklabel formatting that pandas uses.
  2. As far as I can tell, date2num is incompatible with how pandas internally represents datetimes as floats. So if I use date2num at all, all other datetimes must be converted too.

The other work around involves a dummy plot:

# Plot 6: 2 subplots with alpha=0 dummy (Working)
fig, axes = plt.subplots(2, sharex=True)
dummy_ts = ts[::(len(ts)-1)] + 10 # make it out of sight
dummy_ts.plot(ax=axes[0], alpha=0) # and invisible for good measure
add_rectangle(axes[0], *args, **kwargs)
ts.plot(ax=axes[1])
axes[0].set_ylim(axes[1].get_ylim())
plt.savefig('plot6.png')
plt.close(fig)

Plot 6

My question (finally) is why is this necessary? What changes between doing this on a single subplot vs. multiple? Is there a better, more canonical way?


Python version:

Python 3.6.3 (v3.6.3:2c5fed86e0) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin

Pip freeze:

cycler==0.10.0
kiwisolver==1.0.1
matplotlib==2.2.0
numpy==1.14.2
pandas==0.22.0
pyparsing==2.2.0
python-dateutil==2.7.0
pytz==2018.3
six==1.11.0
hhquark
  • 379
  • 3
  • 10

1 Answers1

2

I think you found the reason yourself: Pandas datetime representation for the matplotlib axes (may) be completely different from the matplotlib date units (this is not always the case and depends on the span of the data).

Since I don't know of any way to convert the rectangle's coordinates to the pandas units, the only option is to plot the pandas plot in matplotlib units.

The problem

But let's start at the beginning. Case 1 and 2 work fine for me.

enter image description here

For the third case, the rectangle is added to the other axes, which does have a different scale. This can be seen by printing the transform.

def add_rectangle(ax, x, y, width, height, **kwargs):
    rect = mpatches.Rectangle( (x, y), width, height, **kwargs )
    ax.add_patch(rect)
    return rect

# Case 1 - working
fig, ax = plt.subplots()
ts.plot(ax=ax)
r = add_rectangle(ax, *args, **kwargs)
print r.get_transform()

# This prints
# BboxTransformTo(
#        Bbox(x0=17565.0, y0=-1.0, x1=17598.0, y1=1.0)),

# Case 3 - non-working
fig, axes = plt.subplots(2, sharex=True)
ts.plot(ax=axes[1], x_compat=True)
r = add_rectangle(axes[0], *args, **kwargs)
print r.get_transform()

# BboxTransformTo(
#        Bbox(x0=736728.0, y0=-1.0, x1=736761.0, y1=1.0)),

In the second case, the units are the matplotlib date units, because pandas did not change the transform for the axes in which it did not plot anything.

The solution

The easiest option is probably to tell pandas not to change the scale. This would be done using

x_compat=True

This has essentially the same effect as plotting everything in matplotlib units.

# Plot 3: 2 subplots with ts plotted first
fig, axes = plt.subplots(2, sharex=True)
ts.plot(ax=axes[1], x_compat=True)
r = add_rectangle(axes[0], *args, **kwargs)

# Plot 4: 2 subplots with ts plotted second
fig, axes = plt.subplots(2, sharex=True)
add_rectangle(axes[0], *args, **kwargs)
ts.plot(ax=axes[1], x_compat=True)

enter image description here

So indeed the nice pandas formatting is gone. But you may replicate it with the matplotlib.dates formatters. E.g. in this post. an easy solution to add the days is presented. Here, you would maybe rather use a FuncFormatter as follows:

fig, axes = plt.subplots(2, sharex=True)
ts.plot(ax=axes[1], x_compat=True)
r = add_rectangle(axes[0], *args, **kwargs)

import matplotlib.dates as mdates
import matplotlib.ticker as mticker

def f(val, _):
    d = mdates.num2date(val)
    if d.month == 1:
        return d.strftime("%b\n%Y")
    else:
        return d.strftime("%b")

axes[1].xaxis.set_major_locator(mdates.MonthLocator())
axes[1].xaxis.set_minor_locator(mdates.WeekdayLocator())
axes[1].xaxis.set_major_formatter(mticker.FuncFormatter(f))
fig.autofmt_xdate(rotation=0,ha="center")

producing

enter image description here

Community
  • 1
  • 1
ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
  • Thanks! I didn't know about the `x_compat` option. Any guess why Plot 2 worked for you but not me? – hhquark Mar 16 '18 at 03:20
  • @ImportanceOfBeingErnest Can you please help me in this data visualization question: https://stackoverflow.com/questions/49315084/how-to-resize-the-correlation-plot-for-better-visualization?noredirect=1#comment85632529_49315084 – stone rock Mar 16 '18 at 08:20
  • This might be version dependent. I tested this with matplotlib 2.2 and pandas 0.20.1. – ImportanceOfBeingErnest Mar 16 '18 at 09:15
  • @ImportanceOfBeingErnest: Thank you again. It was both version and backend dependent. Matplotlib 2.2 and macosx backend seemed to do it. I actually had to upgrade to python 3.6 to get the macosx backend to work. – hhquark Mar 16 '18 at 22:22
  • It can't be depending on the backend. The backend is just how things are drawn on the canvas and has nothing to do with the units on axes. – ImportanceOfBeingErnest Mar 16 '18 at 22:24
  • Hm. Then maybe it was the upgrade to python 3? I couldn't get Plot2 to work no matter what Pandas+Matplotlib versions I tried. What's weird is you appear to be using python 2 based on `print r.get_transform()`. At this point, though, I'm just happy I've a working plot and am not too concerned with exactly how I fixed it. – hhquark Mar 16 '18 at 22:32
  • Yes I'm using python 2.7 here. But that shouldn't matter either. It's really only the pandas and matplotlib versions which would determine the outcome. – ImportanceOfBeingErnest Mar 16 '18 at 22:37