0

I'm an absolute pandas/matplotlib beginner and I can't figure out this issue after quite a few searches.

Just learned that in order to format dates (basically space them out), I need to work an additional thing called fig (for figure):

fig, tg = plt.subplots(1)
tg.plot(pandoc['date_time'], pandoc['total_goals'], kind="bar")
tg.fmt_xdata = mdates.DateFormatter('%Y-%m-%d')
fig.autofmt_xdate()

However, when I try to change my data kind to `bar, I'm getting the following error:

AttributeError: Unknown property kind

It worked perfectly when I simply did

pandoc['total_goals'].plot(kind='bar')

But then mdates.DateFormatter wouldn't work.

I'm missing something. What is it?

zerohedge
  • 3,185
  • 4
  • 28
  • 63

1 Answers1

3

Pandas DataFrames, such as pandoc, have a plot method with a kind parameter. So it is possible to make a plot using

pandoc.plot(x='date_time', y='total_goals', kind="bar", ax=tg)

Notice that ax=tg is used to tell pandoc to draw on the matplotlib Axes, tg.


In contrast, matplotlib Axes, such as tg, have a plot method, but tg.plot does not have a kind parameter. Instead, to make a bar plot with an Axes object, call its tg.bar method.


Using the pandoc.plot method, you could make a bar plot using something like

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
np.random.seed(2016)

N = 150
pandoc = pd.DataFrame({'date_time':pd.date_range('2000-1-1', periods=N, freq='M'),
                   'total_goals':np.random.randint(10, size=N)})
fig, tg = plt.subplots(1)
pandoc.plot(x='date_time', y='total_goals', kind="bar", ax=tg)

labels, skip = ['']*N, 10
labels[skip//2::skip] = pandoc['date_time'].dt.strftime('%Y-%m-%d')[skip//2::skip]
tg.set_xticklabels(labels)

fig.autofmt_xdate()
plt.show()

enter image description here

Note that tg.set_xticklabels is used to set the xticklabels instead of mdates.DateFormatter. When making a bar plot the underlying bar plot xtick values are integers:

In [21]: tg.get_xticks()
Out[26]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

You can only use mdates.DateFormatter when the xtick values are dates. Since a bar plot has a fixed number of bars, there is no advantage to using a dynamic formatter like mticker.FuncFormatter; you are best off simply setting the xticklabels using the Axes.set_xticklabels method.


labels = ['']*N

creates a list of N empty strings. For example, ['']*2 evaluates to ['', ''].

x, y = a, b

is equivalent to

x = a
y = b

So labels, skip = ['']*N, 10 is equivalent to

labels = ['']*N
skip = 10

Python slice notation, e.g. x[start:end:step] is explained here. For example,

In [227]: x = list('ABCDEFGHIJK'); x
Out[227]: ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K']

In [228]: x[1::3]
Out[228]: ['B', 'E', 'H', 'K']    <-- the first item is x[1], and then we pick up every 3rd item

So in the code above, pandoc['date_time'].dt.strftime('%Y-%m-%d') is a sequence of strings and if we call it x, then x[skip//2::skip] is a new sequence which starts with x[skip//2] and then steps by skip amount.


skip//2 divides skip by 2 using integer-division.


labels begins as a list of N empty strings. With skip=10, the assignment

labels[skip//2::skip] = pandoc['date_time'].dt.strftime('%Y-%m-%d')[skip//2::skip]

replaces every 10th element (starting at skip//2) with a date string from pandoc['date_time'].dt.strftime('%Y-%m-%d').


pandoc['date_time'] is a time series. pandoc['date_time'].dt.strftime('%Y-%m-%d') uses the Series.dt.strftime method to format the dates into date-strings in %Y-%m-%d format.

Community
  • 1
  • 1
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Thank you for the great answer. However, I'm getting the dates squished again. (Not spaced - I have over 150 bars) – zerohedge Dec 25 '16 at 19:00
  • I'm assuming it's because I'm not setting the periods the way I'm reading to the DataFrame (I'm reading from an SQL query) – zerohedge Dec 25 '16 at 19:06
  • Do you want to group some of the dates together (i.e. make a histogram) or do you want 150 bars? If you want 150 bars, you do want an xticklabel for each bar, or can we label, say, every 10th bar? – unutbu Dec 25 '16 at 19:09
  • I want 150 bars, not all of them need to be labeled though (not even sure I'm using the right terminology) – zerohedge Dec 25 '16 at 19:14
  • I've modified the example to show how you could label every Nth bar. – unutbu Dec 25 '16 at 19:20
  • Thanks again. While I don't really understand what the lines `labels, skip = ['']*N, 10` and `labels[skip//2::skip] = pandoc['date_time'].dt.strftime('%Y-%m-%d')[skip//2::skip]` are doing yet, this is one of the more elaborate and concise answers I've had here. Cheers! – zerohedge Dec 26 '16 at 12:38
  • 1
    I've added a bit of explanation for those two lines. – unutbu Dec 26 '16 at 13:00
  • Wonderful. Why are we doing `skip//2` and not just `skip`? – zerohedge Dec 26 '16 at 13:11
  • 1
    It's easiest to see with an example. Try changing the code above to `labels[::skip] = pandoc['date_time'].dt.strftime('%Y-%m-%d')[::skip]`. You'll see the labels begin at the left edge, and there is space on the right with no label. I tried to "fix" that by shifting the labels by half the skip amount so that the labels are more centered relative to the edges of the plot. – unutbu Dec 26 '16 at 13:18