4

I have a csv file and I am trying to plot the average of some values per month. My csv file is structured as shown below, so I believe that I should group my data daily, then monthly in order to calculate the mean value.

timestamp,heure,lat,lon,impact,type
2007-01-01 00:00:00,13:58:43,33.837,-9.205,10.3,1
2007-01-02 00:00:00,00:07:28,34.5293,-10.2384,17.7,1
2007-01-02 00:00:00,23:01:03,35.0617,-1.435,-17.1,2
2007-01-03 00:00:00,01:14:29,36.5685,0.9043,36.8,1
2007-01-03 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1

I am using this code:

import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

df= pd.read_csv("ave.txt", sep=',', names =["timestamp","heure","lat","lon","impact","type"])
daily = df.set_index('timestamp').groupby(pd.TimeGrouper(key='timestamp', freq='D', axis=1), axis=1)['impact'].count()
monthly = daily.groupby(pd.TimeGrouper(freq='M')).mean()
ax = monthly.plot(kind='bar')
plt.show()

But, I keep getting errors like this:

KeyError: 'The grouper name timestamp is not found'

any ideas ??

VaM999
  • 453
  • 1
  • 9
  • 23
Mar
  • 419
  • 1
  • 7
  • 19

2 Answers2

3

You're getting this error because you've set the timestamp column to index. Try removing key='timestamp' from TimeGrouper() or the set_index method and it should group as you expect:

daily = df.set_index('timestamp').groupby(pd.TimeGrouper(freq='D', axis=1), axis=1)['impact'].count()

or

daily = df.groupby(pd.TimeGrouper(key='timestamp', freq='D', axis=1), axis=1)['impact'].count()
Andrew L
  • 6,618
  • 3
  • 26
  • 30
  • thank you for your answer , I did as you said and now i am getting this error : IndexError: indices are out-of-bounds – Mar Jun 01 '17 at 11:59
  • What line does this occur on? – Andrew L Jun 01 '17 at 12:00
  • on this line : daily = df.groupby(pd.TimeGrouper(key='timestamp', freq='D', axis=1), axis=1)['impact'].count() – Mar Jun 01 '17 at 12:02
  • Are you intentionally using `axis=1` in `groupby()`? Can you try without that argument and see if it's what you expect? – Andrew L Jun 01 '17 at 12:05
  • when i try without that argument , it gives another error : axis must be a DatetimeIndex, but got an instance of 'Index' – Mar Jun 01 '17 at 12:12
1

I believe you need DataFrame.resample.

Also is necessary convert timestamp to DataTimeindex by parameter parse_dates and index_col in read_csv.

names =["timestamp","heure","lat","lon","impact","type"]
data = pd.read_csv('fou.txt',names=names, parse_dates=['timestamp'],index_col=['timestamp'])
print (data.head())

#your code
daily = data.groupby(pd.TimeGrouper(freq='D'))['impact'].count()
monthly = daily.groupby(pd.TimeGrouper(freq='M')).mean()
ax = monthly.plot(kind='bar')
plt.show()

#more simpliest
daily = data.resample('D')['impact'].count()
monthly = daily.resample('M').mean()
ax = monthly.plot(kind='bar')
plt.show()

graph

Also check if really need count, not size. What is the difference between size and count in pandas?

daily = data.resample('D')['impact'].size()
monthly = daily.resample('M').mean()
ax = monthly.plot(kind='bar')
plt.show()
Graham
  • 7,431
  • 18
  • 59
  • 84
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • thank you, i did as you said and it gave me this error : raise TypeError('Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex') TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex – Mar Jun 01 '17 at 12:12
  • skina@skina:~/Documents/density$ python moyenne.py heure lat lon impact type timestamp timestamp heure lat lon impact type 2007-01-01 00:00:00 13:58:43 33.837 -9.205 10.3 1 2007-01-02 00:00:00 00:07:28 34.5293 -10.2384 17.7 1 2007-01-02 00:00:00 23:01:03 35.0617 -1.435 -17.1 2 2007-01-03 00:00:00 01:14:29 36.5685 0.9043 36.8 1 – Mar Jun 01 '17 at 12:16
  • Actually i have a big csv file, what i posted here is just a little sample of my data, how can I convert the whole csv file ? – Mar Jun 01 '17 at 12:23
  • your code is working fine, but it's not giving me the results i need. – Mar Jun 01 '17 at 12:32
  • i need to plot the average of impacts per month – Mar Jun 01 '17 at 12:33
  • i checked data for one month ,and it's giving this error : KeyError: 'the label [2017-01] is not in the [index]' – Mar Jun 01 '17 at 12:39
  • well i think that i am mistaken, i need to plot the average of impacts per month , but i have multiple lines of the same day but in different times (hours), so i need first to group data daily ( a day could has got a lot of impacts) , then group it monthly and plot it , so i believe i should take into consideration the second column which is heure. if you know what i mean. – Mar Jun 01 '17 at 12:46
  • my desired output, is to plot the month on x axix and mean of impacts per month on y axix – Mar Jun 01 '17 at 12:50
  • it's not working , it says : raise TypeError('Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex') TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex – Mar Jun 01 '17 at 13:00
  • shall i send you my csv file to picture my data ? – Mar Jun 01 '17 at 13:00
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/145640/discussion-between-marie-antoinette-and-jezrael). – Mar Jun 01 '17 at 13:10