grouping and plotting data by time in python

Question

I have a csv file and I am trying to plot the average of some values per month. My csv file is structured as shown below, so I believe that I should group my data daily, then monthly in order to calculate the mean value.

timestamp,heure,lat,lon,impact,type
2007-01-01 00:00:00,13:58:43,33.837,-9.205,10.3,1
2007-01-02 00:00:00,00:07:28,34.5293,-10.2384,17.7,1
2007-01-02 00:00:00,23:01:03,35.0617,-1.435,-17.1,2
2007-01-03 00:00:00,01:14:29,36.5685,0.9043,36.8,1
2007-01-03 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1

I am using this code:

import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

df= pd.read_csv("ave.txt", sep=',', names =["timestamp","heure","lat","lon","impact","type"])
daily = df.set_index('timestamp').groupby(pd.TimeGrouper(key='timestamp', freq='D', axis=1), axis=1)['impact'].count()
monthly = daily.groupby(pd.TimeGrouper(freq='M')).mean()
ax = monthly.plot(kind='bar')
plt.show()

But, I keep getting errors like this:

KeyError: 'The grouper name timestamp is not found'

any ideas ??

score 3 · Answer 1 · answered Jun 01 '17 at 11:56

3

You're getting this error because you've set the timestamp column to index. Try removing key='timestamp' from TimeGrouper() or the set_index method and it should group as you expect:

daily = df.set_index('timestamp').groupby(pd.TimeGrouper(freq='D', axis=1), axis=1)['impact'].count()

or

daily = df.groupby(pd.TimeGrouper(key='timestamp', freq='D', axis=1), axis=1)['impact'].count()

answered Jun 01 '17 at 11:56

Andrew L

6,618
3
26
30

thank you for your answer , I did as you said and now i am getting this error : IndexError: indices are out-of-bounds – Mar Jun 01 '17 at 11:59
What line does this occur on? – Andrew L Jun 01 '17 at 12:00
on this line : daily = df.groupby(pd.TimeGrouper(key='timestamp', freq='D', axis=1), axis=1)['impact'].count() – Mar Jun 01 '17 at 12:02
Are you intentionally using `axis=1` in `groupby()`? Can you try without that argument and see if it's what you expect? – Andrew L Jun 01 '17 at 12:05
when i try without that argument , it gives another error : axis must be a DatetimeIndex, but got an instance of 'Index' – Mar Jun 01 '17 at 12:12

score 1 · Answer 2 · edited Sep 27 '17 at 16:44

1

I believe you need DataFrame.resample.

Also is necessary convert timestamp to DataTimeindex by parameter parse_dates and index_col in read_csv.

names =["timestamp","heure","lat","lon","impact","type"]
data = pd.read_csv('fou.txt',names=names, parse_dates=['timestamp'],index_col=['timestamp'])
print (data.head())

#your code
daily = data.groupby(pd.TimeGrouper(freq='D'))['impact'].count()
monthly = daily.groupby(pd.TimeGrouper(freq='M')).mean()
ax = monthly.plot(kind='bar')
plt.show()

#more simpliest
daily = data.resample('D')['impact'].count()
monthly = daily.resample('M').mean()
ax = monthly.plot(kind='bar')
plt.show()

Also check if really need count, not size. What is the difference between size and count in pandas?

daily = data.resample('D')['impact'].size()
monthly = daily.resample('M').mean()
ax = monthly.plot(kind='bar')
plt.show()

edited Sep 27 '17 at 16:44

Graham

7,431
18
59
84

answered Jun 01 '17 at 12:06

jezrael

822,522
95
1,334
1,252

thank you, i did as you said and it gave me this error : raise TypeError('Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex') TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex – Mar Jun 01 '17 at 12:12
skina@skina:~/Documents/density$ python moyenne.py heure lat lon impact type timestamp timestamp heure lat lon impact type 2007-01-01 00:00:00 13:58:43 33.837 -9.205 10.3 1 2007-01-02 00:00:00 00:07:28 34.5293 -10.2384 17.7 1 2007-01-02 00:00:00 23:01:03 35.0617 -1.435 -17.1 2 2007-01-03 00:00:00 01:14:29 36.5685 0.9043 36.8 1 – Mar Jun 01 '17 at 12:16
Actually i have a big csv file, what i posted here is just a little sample of my data, how can I convert the whole csv file ? – Mar Jun 01 '17 at 12:23
your code is working fine, but it's not giving me the results i need. – Mar Jun 01 '17 at 12:32
i need to plot the average of impacts per month – Mar Jun 01 '17 at 12:33
i checked data for one month ,and it's giving this error : KeyError: 'the label [2017-01] is not in the [index]' – Mar Jun 01 '17 at 12:39
well i think that i am mistaken, i need to plot the average of impacts per month , but i have multiple lines of the same day but in different times (hours), so i need first to group data daily ( a day could has got a lot of impacts) , then group it monthly and plot it , so i believe i should take into consideration the second column which is heure. if you know what i mean. – Mar Jun 01 '17 at 12:46
my desired output, is to plot the month on x axix and mean of impacts per month on y axix – Mar Jun 01 '17 at 12:50
it's not working , it says : raise TypeError('Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex') TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex – Mar Jun 01 '17 at 13:00
shall i send you my csv file to picture my data ? – Mar Jun 01 '17 at 13:00
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/145640/discussion-between-marie-antoinette-and-jezrael). – Mar Jun 01 '17 at 13:10

grouping and plotting data by time in python

2 Answers2