-5

I am trying to plot some data using matplotlib and pandas. However when using the DateFormatter, dates are being rendered incorrectly depending on what I filter out of the DataFrame:

The dates in the two examples below render with matplotlib as 'August 20 00 2013', as expected:

df['metric2'].plot()
ax = gca()
ax.xaxis.set_major_formatter(DateFormatter('%B %d %H %Y'))
draw()

df[df['metric1']>1000]['metric2'].plot()
ax = gca()
ax.xaxis.set_major_formatter(DateFormatter('%B %d %H %Y'))
draw()

However using the code below, the dates are being rendered as 'February 01 00 1048':

df[df['browser']=='Chrome/29']['metric2'].plot()
ax = gca()
ax.xaxis.set_major_formatter(DateFormatter('%B %d %H %Y'))
draw()
DJElbow
  • 3,345
  • 11
  • 41
  • 52
  • 7
    Without seeing some of these data it's going to be hard to diagnose the problem. – Phillip Cloud Sep 17 '13 at 23:22
  • _maybe_ related http://stackoverflow.com/questions/13988111/importing-pandas-in-python-changes-how-matplotlib-handles-datetime-objects/13993480#13993480 on the chance that pandas is still fouling up the date handling code. – tacaswell Sep 18 '13 at 01:04
  • The dates look like this '2013-08-18 00' in the original file, followed by a browser(in the format above) and 3 metrics. Here is how I am pulling the data from the file into pandas:`def dateParserHour(time_string): return datetime.datetime.strptime(time_string, '%Y-%m-%d %H')` and `pd.read_table('file.txt', index_col=0, parse_dates=True, date_parser=dateParserHour)` – DJElbow Sep 18 '13 at 02:25
  • Can you just show `df.head()` or some other subset of your data instead of trying to describe it? Thanks. – Phillip Cloud Sep 18 '13 at 03:07
  • I have found a work around. For some reason, when I am plotting the third example above, matplotlib won't play nice with with my TimeSeries. If I rebuild the index with the code below and then plot (with the same DateFormatter() function, it works fine. `df2 = df[df['browser']=='Chrome/29']['metric2']; df2.index = df2.index.astype(datetime.datetime);` – DJElbow Sep 18 '13 at 22:24

1 Answers1

4

We need to have a concrete set of data and a program to refer to. No problems here:

data.txt:

2013-08-18 00   IE  1000    500 3000
2013-08-19 00   FF  2000    250 6000
2013-08-20 00   Opera   3000    450 9000
2001-03-21 00   Chrome/29   3000    450 9000
2013-08-21 00   Chrome/29   3000    450 9000
2014-01-22 00   Chrome/29   3000    750 9000

.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as md
import datetime as dt


df = pd.read_table(
    'data.txt', 
    index_col=0, 
    parse_dates=True,
    date_parser=lambda s: dt.datetime.strptime(s, '%Y-%m-%d %H'),
    header=None,
    names=['browser', 'metric1', 'metric2', 'metric3']
)

print df

df[df['browser']=='Chrome/29']['metric2'].plot()
ax = plt.gca()
ax.xaxis.set_major_formatter(md.DateFormatter('%B %d %H %Y'))
plt.draw()
plt.show()


--output:--
              browser  metric1  metric2  metric3
2013-08-18         IE     1000      500     3000
2013-08-19         FF     2000      250     6000
2013-08-20      Opera     3000      450     9000
2001-03-21  Chrome/29     3000      450     9000
2013-08-21  Chrome/29     3000      450     9000
2014-01-22  Chrome/29     3000      750     9000

enter image description here

And with the axes adjusted so you can see the points better(setting date range of x axis, setting range of y axis):

...
df[df['browser']=='Chrome/29']['metric2'].plot(style='r--')
ax = plt.gca()
ax.xaxis.set_major_formatter(md.DateFormatter('%B %d %H %Y'))

ax.set_xlim(dt.datetime(2000, 1, 1,), dt.datetime(2017, 1, 1))
ax.set_ylim(400, 1000)
...
...

enter image description here

As long as you refuse to post a minimal example along with the data that produces the output you don't want...

7stud
  • 46,922
  • 14
  • 101
  • 127
  • I fail to see why this answer got a down vote – tacaswell Sep 18 '13 at 03:59
  • I originally downvoted because all this answer does is show the expected behavior (not *really* that helpful since the OP wasn't seeing this behavior). However, a downvote was probably a bit overkill. My apologies. – Phillip Cloud Sep 18 '13 at 13:58
  • Sorry about the delayed response. The sample I was preparing is just like the one above. The only difference is that my index has the name 'hour'(which is the label of the column in the original file). I created a new file just containing the first 5 rows of the original to rerun the analysis. When I do this my dates in matplotlib appear as expected. Could it be possible that a certain value in the TimeSeries from the original file is causing the issues? Just by scanning the unique values in the original file I don't see any issues. – DJElbow Sep 18 '13 at 15:45
  • _The only difference is that my index has the name 'hour'_ Can you explain what that means? – 7stud Sep 18 '13 at 22:45