Create plot with Pandas and show similar output as with Matplotlib directly

Question

I have a query that I run that outputs a list of data consisting of a date string and a count:

date_cnts = [(u'2014-06-27', 1),
 (u'2014-06-29', 3),
 (u'2014-06-30', 1),
 (u'2014-07-01', 1),
 (u'2014-07-02', 1),
 (u'2014-07-09', 1),
 (u'2014-07-10', 3),
 (u'2014-07-11', 1),
 (u'2014-07-12', 2),
 (u'2014-07-14', 1),
 (u'2014-07-15', 2),
 (u'2014-07-17', 3),
 (u'2014-07-18', 1),
 (u'2014-07-20', 1),
 (u'2014-07-21', 1),
 (u'2014-07-23', 2),
 (u'2014-07-26', 2),
 (u'2014-07-27', 2),
 (u'2014-07-28', 7),
 (u'2014-07-29', 3),
 (u'2014-07-31', 2),
 (u'2014-08-01', 1),
 (u'2014-08-05', 4),
 (u'2014-08-07', 2),
 (u'2014-08-08', 1),
 (u'2014-08-13', 1),
 (u'2014-08-14', 3),
 (u'2014-08-15', 1),
 (u'2014-08-16', 6),
 (u'2014-08-17', 1),
 (u'2014-08-18', 1),
 (u'2014-08-20', 1),
 (u'2014-08-24', 1),
 (u'2014-08-25', 3),
 (u'2014-08-29', 1),
 (u'2014-08-30', 1),
 (u'2014-09-03', 3),
 (u'2014-09-13', 1),
 (u'2014-09-14', 1),
 (u'2014-09-24', 3),
 (u'2014-10-20', 1),
 (u'2014-10-24', 1),
 (u'2014-11-05', 3),
 (u'2014-11-09', 1),
 (u'2014-11-12', 1),
 (u'2014-11-13', 1),
 (u'2014-11-14', 1),
 (u'2014-11-18', 1),
 (u'2014-11-19', 4),
 (u'2014-11-22', 1),
 (u'2014-11-26', 3),
 (u'2014-11-28', 3),
 (u'2014-12-01', 2),
 (u'2014-12-02', 2),
 (u'2014-12-04', 2),
 (u'2014-12-05', 1),
 (u'2014-12-06', 5),
 (u'2014-12-11', 1),
 (u'2014-12-15', 10)]

Notice that there are date gaps in this data set, indicating that the missing dates have a value of 0.

My working (non-Pandas) version of code looks like this:

from matplotlib import pyplot as plt
x_val = [datetime.strptime(x[0],'%Y-%m-%d') for x in date_cnts]
y_val = [x[1] for x in date_cnts]
plt.bar(x_val, y_val)
plt.grid(True)
plt.show()

This outputs this image:

Matplotlib output

Now, if I convert my query results to a Panda's dataframe

          Date  Count
0   2014-06-27      1
1   2014-06-29      3
2   2014-06-30      1
3   2014-07-01      1
4   2014-07-02      1
5   2014-07-09      1
6   2014-07-10      3
7   2014-07-11      1
8   2014-07-12      2
9   2014-07-14      1
10  2014-07-15      2
11  2014-07-17      3
12  2014-07-18      1
13  2014-07-20      1
14  2014-07-21      1
15  2014-07-23      2
16  2014-07-26      2
17  2014-07-27      2
18  2014-07-28      7
19  2014-07-29      3
20  2014-07-31      2
21  2014-08-01      1
22  2014-08-05      4
23  2014-08-07      2
24  2014-08-08      1
25  2014-08-13      1
26  2014-08-14      3
27  2014-08-15      1
28  2014-08-16      6
29  2014-08-17      1
30  2014-08-18      1
31  2014-08-20      1
32  2014-08-24      1
33  2014-08-25      3
34  2014-08-29      1
35  2014-08-30      1
36  2014-09-03      3
37  2014-09-13      1
38  2014-09-14      1
39  2014-09-24      3
40  2014-10-20      1
41  2014-10-24      1
42  2014-11-05      3
43  2014-11-09      1
44  2014-11-12      1
45  2014-11-13      1
46  2014-11-14      1
47  2014-11-18      1
48  2014-11-19      4
49  2014-11-22      1
50  2014-11-26      3
51  2014-11-28      3
52  2014-12-01      2
53  2014-12-02      2
54  2014-12-04      2
55  2014-12-05      1
56  2014-12-06      5
57  2014-12-11      1
58  2014-12-15     10

And utilize the simple Panda's wrapper to plot this:

plt.figure()
df.plot(kind='bar', grid=True, legend=False, x='Date', y=u'Count')
plt.show()

I get this result. Notice that my missing days do not appear in this graph.

Pandas output

How do I readd the gaps (and 0 values) where my dates do not exist in the DataFrame?

The reason I want to utilize Pandas is to take advantage of some of it's other features (most importantly, a rolling average).

score 1 · Accepted Answer · edited May 23 '17 at 11:50

I wrote a working version, probably not the best but it will do the job. It is based on reindexing your original data into a DataFrame with a sample for everyday.

import pandas as pd
import matplotlib.pyplot as plt

#%% make data
df =  pd.DataFrame(date_cnts)
df.columns = ['Date', 'Count']

#%% make dataframe with everyday sampling
df.index = pd.to_datetime(df['Date'])
startdate = df.index[0]
enddate = df.index[-1]
df_new = df.reindex(pd.date_range(startdate, enddate, freq='1D'))

#%% plot the results
df_new['Count'].plot(kind='bar')

# decrease number of days 
new_xticks = plt.xticks()[0][1:-1:10]
plt.xticks(new_xticks)

enter image description here

For further formatting of the xticks I recommend this question: Pandas timeseries plot setting x-axis major and minor ticks and labels

Create plot with Pandas and show similar output as with Matplotlib directly

1 Answers1