4

I'm currently attempting to graph a fairly small dataset using the matplotlib and pandas libraries. The format of the dataset is a CSV file. Here is the dataset:

DATE,UNRATE
1948-01-01,3.4
1948-02-01,3.8
1948-03-01,4.0
1948-04-01,3.9
1948-05-01,3.5
1948-06-01,3.6
1948-07-01,3.6
1948-08-01,3.9
1948-09-01,3.8
1948-10-01,3.7
1948-11-01,3.8
1948-12-01,4.0

I loaded the dataset using pandas (as can be seen, the file that holds that dataset is named 'dataset.csv'):

import matplotlib.pyplot as plt
import pandas as pd

dataset = pd.read_csv('dataset.csv')
dataset['DATE'] = pd.to_datetime(dataset['DATE'])

I then attempted to plot the dataset loaded, using matplotlib:

plt.plot(dataset['DATE'], dataset['UNRATE'])
plt.show()

The code above mostly worked fine, and displayed the following graph:

enter image description here

The problem, however, is that the data I wanted displayed on the x axis, seems to have only been plotted in intervals of two:

enter image description here

I found the question, Changing the “tick frequency” on x or y axis in matplotlib?, which does correlate to my problem. But, from my testing, only seems to work with integral values.

I also found the question, controlling the number of x ticks in pyplot, which seemed to provide a solution to my problem. The method the answer said to use, to_pydatetime, was a method of DatetimeIndex. Since my understanding is that pandas.to_datetime would return a DatetimeIndex by default, I could use to_pydatetime on dataset['DATE']:

plt.xticks(dataset['DATE'].to_pydatetime())

However, I instead received the error:

AttributeError: 'Series' object has no attribute 'to_pydatetime'

Since this appears to just be default behavior, is there a way to force matplotlib to graph each point along the x axis, rather than simply graphing every other point?

Christian Dean
  • 22,138
  • 7
  • 54
  • 87
  • 1
    Nice to see you tinkering with some pandas :) – cs95 Mar 25 '18 at 01:28
  • Thanks, I actually just started seriously learning it today. As you can see though, I'm not very familiar with the library :-( – Christian Dean Mar 25 '18 at 01:30
  • 1
    The simplest fix is usually plotting strings (getting datetime out of the equation completely), `dataset['DATE'].astype(str)` – cs95 Mar 25 '18 at 01:31
  • Thanks @cᴏʟᴅsᴘᴇᴇᴅ, that worked. Is there any way to avoid having to re-cast `dataset['DATE']` as a `str` though? It seemed best to use the appropriate `dtype` since I _was_ working with dates, but if there's not, I don't think it would be to big of an issue. – Christian Dean Mar 25 '18 at 01:34
  • This is the "I don't know any better so this is good enough" fix... hopefully @ImportanceOfBeingEarnest may know a thing or too that I don't. He frequents the mplotlib tag a lot more than anyone else does so hopefully he sees your question. – cs95 Mar 25 '18 at 01:35
  • Okay, gotcha @cᴏʟᴅsᴘᴇᴇᴅ, thanks. For now I can just leave `dataset['DATE']` with a `dtype` of `str`. – Christian Dean Mar 25 '18 at 01:37
  • 1
    To convert `dataset['DATE']` to a datetime, you need to use `dataset['DATE'].dt.to_pydatetime()` to first convert to a pandas datetime and then to Python! – willk Mar 25 '18 at 01:55

1 Answers1

8

To get rid of the error you may convert the dates as follows and also set the labels accordingly:

plt.xticks(dataset['DATE'].tolist(),dataset['DATE'].tolist())

or as has been mentionned in the comments

plt.xticks(dataset['DATE'].dt.to_pydatetime(),dataset['DATE'].dt.to_pydatetime()) 

enter image description here

But let's look at some more useful options.

Plotting strings

First of all it is possible to plot the data as it is, i.e. as strings.

import matplotlib.pyplot as plt
import pandas as pd

dataset = pd.read_csv('dateunrate.txt')
plt.plot(dataset['DATE'], dataset['UNRATE'])

plt.setp(plt.gca().get_xticklabels(), rotation=45, ha="right")
plt.show()

enter image description here

This is just like plotting plt.plot(["apple", "banana", "cherry"], [1,2,3]). This means that the successive dates are just placed one-by-one on the axes, independent on whether they are a minute, a day or a year appart. E.g. if your dates were 2018-01-01, 2018-01-03, 2018-01-27 they would still appear equally spaced on the axes.

Plot dates with pandas (automatically)

Pandas can nicely plot dates out of the box if the dates are in the index of the dataframe. To this end you may read the dataframe in a way that the first csv column is parsed as the index.

import matplotlib.pyplot as plt
import pandas as pd

dataset = pd.read_csv('dateunrate.txt', parse_dates=[0], index_col=0)
dataset.plot()

plt.show() 

enter image description here

This is equivalent to

dataset = pd.read_csv('../dateunrate.txt', parse_dates=[0])
dataset = dataset.set_index("DATE")
dataset.plot()

or

dataset = pd.read_csv('../dateunrate.txt')
dataset["DATE"] = pd.to_datetime(dataset["DATE"])
dataset = dataset.set_index("DATE")
dataset.plot()

or even

dataset = pd.read_csv('../dateunrate.txt')
dataset["DATE"] = pd.to_datetime(dataset["DATE"])
dataset.plot(x="DATE",y="UNRATE")

This works nice in this case because you happen to have one date per month and pandas will decide to show all 12 months as ticklabels in this case.
For other cases this may result in different tick locations.

Plot dates with matplotlib or pandas (manually)

In the general case, you may use matplotlib.dates formatters and locators to tweak the tick(label)s in the way you want. Here, we might use a MonthLocator and set the ticklabel format to "%b %Y". This works well with matplotlib plot or pandas plot(x_compat=True).

import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.dates as mdates

dataset = pd.read_csv('dateunrate.txt', parse_dates=[0], index_col=0)

plt.plot(dataset.index, dataset['UNRATE'])
## or use 
#dataset.plot(x_compat=True) #note the x_compat argument

plt.gca().xaxis.set_major_locator(mdates.MonthLocator())
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter("%b %Y"))

plt.setp(plt.gca().get_xticklabels(), rotation=45, ha="right")
plt.show()

enter image description here

Community
  • 1
  • 1
ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712