2

I just discovered something really strange when using plot method of pandas.DataFrame. I am using pandas 0.19.1. Here is my MWE:

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd

t = pd.date_range('1990-01-01', '1990-01-08', freq='1H')
x = pd.DataFrame(np.random.rand(len(t)), index=t)

fig, axe = plt.subplots()
x.plot(ax=axe)
plt.show(axe)

xt = axe.get_xticks()

When I try to format my xticklabels I get strange beahviours, then I insepcted objects to understand and I have found the following:

  • t[-1] - t[0] = Timedelta('7 days 00:00:00'), confirming the DateTimeIndex is what I expect;
  • xt = [175320, 175488], xticks are integers but they are not equals to a number of days since epoch (I do not have any idea about what it is);
  • xt[-1] - xt[0] = 168 there are more like index, there is the same amount that len(x) = 169.

This explains why I cannot succed to format my axe using:

axe.xaxis.set_major_locator(mdates.HourLocator(byhour=(0,6,12,18)))
axe.xaxis.set_major_formatter(mdates.DateFormatter("%a %H:%M"))

The first raise an error that there is to many ticks to generate The second show that my first tick is Fri 00:00 but it should be Mon 00:00 (in fact matplotlib assumes the first tick to be 0481-01-03 00:00, oops this is where my bug is).

Axis Ticks Numbering bug

It looks like there is some incompatibility between pandas and matplotlib integer to date conversion but I cannot find out how to fix this issue.

If I run instead:

fig, axe = plt.subplots()
axe.plot(x)
axe.xaxis.set_major_formatter(mdates.DateFormatter("%a %H:%M"))
plt.show(axe)

xt = axe.get_xticks()

Everything works as expected but I miss all cool features from pandas.DataFrame.plot method such as curve labeling, etc. And here xt = [726468. 726475.].

How can I properly format my ticks using pandas.DataFrame.plot method instead of axe.plot and avoiding this issue?

Update

The problem seems to be about origin and scale (units) of underlying numbers for date representation. Anyway I cannot control it, even by forcing it to the correct type:

t = pd.date_range('1990-01-01', '1990-01-08', freq='1H', origin='unix', units='D')

There is a discrepancy between matplotlib and pandas representation. And I could not find any documentation of this problem.

jlandercy
  • 7,183
  • 1
  • 39
  • 57
  • 1
    Did you try the solution in http://stackoverflow.com/questions/12945971/pandas-timeseries-plot-setting-x-axis-major-and-minor-ticks-and-labels ? – tmrlvi May 12 '17 at 12:58
  • 1
    Also see: http://stackoverflow.com/questions/24620712/how-to-plot-a-pandas-timeseries-using-months-year-resolution-with-few-lines-of?rq=1 Since it's unclear what you would be missing when using the matplotlib command ("cool stuff etc." is completely imprecise), it's hard to help you here. – ImportanceOfBeingErnest May 13 '17 at 09:38
  • @ImportanceOfBeingErnest, I would like to keep my call to DataFrame.plot, at least because it handles labels for legend. – jlandercy May 15 '17 at 08:20
  • Could this be a [XY Problem](http://xyproblem.info)? – ImportanceOfBeingErnest May 15 '17 at 08:23
  • @ImportanceOfBeingErnest, I wonder why I have date rangeing from Year 0481 when I explicitly have set it to 1990. I think when I will understand this, many things will find it answer. Thank you for pointing out the XY problem, but until now I do not feel it like this. – jlandercy May 15 '17 at 08:27
  • @tmrlvi, Thank you for pointing this out. I have added a picture of my problem – jlandercy May 15 '17 at 08:37
  • 1
    The date representation used by pandas is simply different from that of matplotlib. So you cannot use a matplotlib formatter for the dates produced by pandas. The question is, would you like to be stuck with this (the Y) or would you like to tell us the actual problem (the X in the XY-problem). – ImportanceOfBeingErnest May 15 '17 at 08:54
  • @ImportanceOfBeingErnest, Is there pandas formatters? Where are described date representation? – jlandercy May 15 '17 at 09:11
  • No, there are no pandas formatters. But that is not a problem, because as in the question I linked to, you can use the matplotlib plot command and format the axes to your liking. – ImportanceOfBeingErnest May 15 '17 at 10:07
  • @ImportanceOfBeingErnest, there is a way to enforce xticks through DataFrame.plot(). Thank you anyway. – jlandercy May 16 '17 at 13:26
  • Sure, but that was not the question you asked. – ImportanceOfBeingErnest May 16 '17 at 13:29

1 Answers1

1

Is this what you are going for? Note I shortened the date_range to make it easier to see the labels.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
import matplotlib.dates as dates

t = pd.date_range('1990-01-01', '1990-01-04', freq='1H')
x = pd.DataFrame(np.random.rand(len(t)), index=t)

# resample the df to get the index at 6-hour intervals
l = x.resample('6H').first().index

# set the ticks when you plot. this appears to position them, but not set the label
ax = x.plot(xticks=l)

# set the display value of the tick labels
ax.set_xticklabels(l.strftime("%a %H:%M"))
# hide the labels from the initial pandas plot
ax.set_xticklabels([], minor=True)
# make pretty
ax.get_figure().autofmt_xdate()

plt.show()

Like this?

Kyle
  • 2,814
  • 2
  • 17
  • 30