Grouping dataframe by custom date

Question

I have a large dataframe that I'm trying to combine date in one instance by minute and the other by 30 minutes.

df = pd.read_csv('2015-09-01.csv', header=None,\
                    names=['ID','CITY', 'STATE', 'TIMESTAMP','TWEET'], \
                    low_memory=False, \
                    parse_dates=['TIMESTAMP'], usecols=['STATE','TIMESTAMP','TWEET'])

Method 1

I have used this solution but if I try the following:

df = df2.groupby([df2.TIMESTAMP,pd.TimeGrouper(freq='H')])

It results in this error:

TypeError: axis must be a DatetimeIndex, but got an instance of 'Int64Index

which is very weird because TIMESTAMP is being parsed in read_csv

Method 2

I tried setting TIMESTAMP into index then doing:

df = df2.groupby([df2.index,pd.TimeGrouper(freq='H')])

However it's not coming up right as len(df) is 1350 rather than 24 since the dataframe as a whole is from 1 day worth of data.

Method 3

I used this solution but I'm not sure how to set it into 30 minute interval:

df = df2.groupby(df2['TIMESTAMP'].map(lambda x: x.hour))

Sample Data

STATE,TIMESTAMP,TWEET
0,TX,2015-09-25 00:00:01,Wish I could have gone to the game
1,USA,2015-09-25 00:00:01,PSA:  @HaileyCassidyy and I are not related in...
2,USA,2015-09-25 00:00:02,If you gonna fail don't bring some one down wi...
3,NJ,2015-09-25 00:00:02,@_falastinia hol up hol up I can't listen to t...
4,USA,2015-09-25 00:00:02,"Wind 0.0 mph ---. Barometer 30.235 in, Rising ..."
5,NJ,2015-09-25 00:00:03,WHY ISNT GREYS ANATOMY ON?!
6,MI,2015-09-25 00:00:03,@cody_cole06 you bet it is
7,WA,2015-09-25 00:00:04,"Could be worse, I guess, could be in a collisi..."
8,NY,2015-09-25 00:00:04,I'm totally using this graphic some day... tha...
9,USA,2015-09-25 00:00:04,@MKnightOwl @Andromehda LMAO I honestly didn't..

chrisb · Accepted Answer · 2015-10-07T02:24:03.827

3

To group a column by a frequency, you need to pass its name to the key parameter of the Grouper, like this:

df.groupby(pd.Grouper(key='TIMESTAMP', freq='30T'))

Edit:

See the Grouper docs for more - but in general, when you do groupby([a,b]) you are grouping by the unique combinations of a and b.

So in your example you were grouping by all the unique timestamp values (df['TIMESTAMP']) and a time grouper to the index (pd.TimeGrouper defaults to the index if no key is specified) - the TypeError was because your index was not datetimelike.

This also is why you were getting the large number of groups after setting the index to 'TIMESTAMP'.

edited Oct 07 '15 at 02:24

answered Oct 07 '15 at 01:16

chrisb

49,833
8
70
70

Yes, that works. I tried `df2.groupby([df2['TIMESTAMP'],pd.TimeGrouper(freq='H')])` it was giving me TypeError, ever after doing `df2['TIMESTAMP'] = pd.to_datetime(df2['TIMESTAMP']`. Any idea why? – Leb Oct 07 '15 at 01:24

Grouping dataframe by custom date

1 Answers1