I have a query that I run that outputs a list of data consisting of a date string and a count:
date_cnts = [(u'2014-06-27', 1),
(u'2014-06-29', 3),
(u'2014-06-30', 1),
(u'2014-07-01', 1),
(u'2014-07-02', 1),
(u'2014-07-09', 1),
(u'2014-07-10', 3),
(u'2014-07-11', 1),
(u'2014-07-12', 2),
(u'2014-07-14', 1),
(u'2014-07-15', 2),
(u'2014-07-17', 3),
(u'2014-07-18', 1),
(u'2014-07-20', 1),
(u'2014-07-21', 1),
(u'2014-07-23', 2),
(u'2014-07-26', 2),
(u'2014-07-27', 2),
(u'2014-07-28', 7),
(u'2014-07-29', 3),
(u'2014-07-31', 2),
(u'2014-08-01', 1),
(u'2014-08-05', 4),
(u'2014-08-07', 2),
(u'2014-08-08', 1),
(u'2014-08-13', 1),
(u'2014-08-14', 3),
(u'2014-08-15', 1),
(u'2014-08-16', 6),
(u'2014-08-17', 1),
(u'2014-08-18', 1),
(u'2014-08-20', 1),
(u'2014-08-24', 1),
(u'2014-08-25', 3),
(u'2014-08-29', 1),
(u'2014-08-30', 1),
(u'2014-09-03', 3),
(u'2014-09-13', 1),
(u'2014-09-14', 1),
(u'2014-09-24', 3),
(u'2014-10-20', 1),
(u'2014-10-24', 1),
(u'2014-11-05', 3),
(u'2014-11-09', 1),
(u'2014-11-12', 1),
(u'2014-11-13', 1),
(u'2014-11-14', 1),
(u'2014-11-18', 1),
(u'2014-11-19', 4),
(u'2014-11-22', 1),
(u'2014-11-26', 3),
(u'2014-11-28', 3),
(u'2014-12-01', 2),
(u'2014-12-02', 2),
(u'2014-12-04', 2),
(u'2014-12-05', 1),
(u'2014-12-06', 5),
(u'2014-12-11', 1),
(u'2014-12-15', 10)]
Notice that there are date gaps in this data set, indicating that the missing dates have a value of 0
.
My working (non-Pandas) version of code looks like this:
from matplotlib import pyplot as plt
x_val = [datetime.strptime(x[0],'%Y-%m-%d') for x in date_cnts]
y_val = [x[1] for x in date_cnts]
plt.bar(x_val, y_val)
plt.grid(True)
plt.show()
This outputs this image:
Now, if I convert my query results to a Panda's dataframe
Date Count
0 2014-06-27 1
1 2014-06-29 3
2 2014-06-30 1
3 2014-07-01 1
4 2014-07-02 1
5 2014-07-09 1
6 2014-07-10 3
7 2014-07-11 1
8 2014-07-12 2
9 2014-07-14 1
10 2014-07-15 2
11 2014-07-17 3
12 2014-07-18 1
13 2014-07-20 1
14 2014-07-21 1
15 2014-07-23 2
16 2014-07-26 2
17 2014-07-27 2
18 2014-07-28 7
19 2014-07-29 3
20 2014-07-31 2
21 2014-08-01 1
22 2014-08-05 4
23 2014-08-07 2
24 2014-08-08 1
25 2014-08-13 1
26 2014-08-14 3
27 2014-08-15 1
28 2014-08-16 6
29 2014-08-17 1
30 2014-08-18 1
31 2014-08-20 1
32 2014-08-24 1
33 2014-08-25 3
34 2014-08-29 1
35 2014-08-30 1
36 2014-09-03 3
37 2014-09-13 1
38 2014-09-14 1
39 2014-09-24 3
40 2014-10-20 1
41 2014-10-24 1
42 2014-11-05 3
43 2014-11-09 1
44 2014-11-12 1
45 2014-11-13 1
46 2014-11-14 1
47 2014-11-18 1
48 2014-11-19 4
49 2014-11-22 1
50 2014-11-26 3
51 2014-11-28 3
52 2014-12-01 2
53 2014-12-02 2
54 2014-12-04 2
55 2014-12-05 1
56 2014-12-06 5
57 2014-12-11 1
58 2014-12-15 10
And utilize the simple Panda's wrapper to plot this:
plt.figure()
df.plot(kind='bar', grid=True, legend=False, x='Date', y=u'Count')
plt.show()
I get this result. Notice that my missing days do not appear in this graph.
How do I readd the gaps (and 0
values) where my dates do not exist in the DataFrame?
The reason I want to utilize Pandas is to take advantage of some of it's other features (most importantly, a rolling average).