1

I have a large dataframe that I'm trying to combine date in one instance by minute and the other by 30 minutes.

df = pd.read_csv('2015-09-01.csv', header=None,\
                    names=['ID','CITY', 'STATE', 'TIMESTAMP','TWEET'], \
                    low_memory=False, \
                    parse_dates=['TIMESTAMP'], usecols=['STATE','TIMESTAMP','TWEET'])

Method 1

I have used this solution but if I try the following:

df = df2.groupby([df2.TIMESTAMP,pd.TimeGrouper(freq='H')])

It results in this error:

TypeError: axis must be a DatetimeIndex, but got an instance of 'Int64Index

which is very weird because TIMESTAMP is being parsed in read_csv

Method 2

I tried setting TIMESTAMP into index then doing:

df = df2.groupby([df2.index,pd.TimeGrouper(freq='H')])

However it's not coming up right as len(df) is 1350 rather than 24 since the dataframe as a whole is from 1 day worth of data.

Method 3

I used this solution but I'm not sure how to set it into 30 minute interval:

df = df2.groupby(df2['TIMESTAMP'].map(lambda x: x.hour))

Sample Data

STATE,TIMESTAMP,TWEET
0,TX,2015-09-25 00:00:01,Wish I could have gone to the game
1,USA,2015-09-25 00:00:01,PSA:  @HaileyCassidyy and I are not related in...
2,USA,2015-09-25 00:00:02,If you gonna fail don't bring some one down wi...
3,NJ,2015-09-25 00:00:02,@_falastinia hol up hol up I can't listen to t...
4,USA,2015-09-25 00:00:02,"Wind 0.0 mph ---. Barometer 30.235 in, Rising ..."
5,NJ,2015-09-25 00:00:03,WHY ISNT GREYS ANATOMY ON?!
6,MI,2015-09-25 00:00:03,@cody_cole06 you bet it is
7,WA,2015-09-25 00:00:04,"Could be worse, I guess, could be in a collisi..."
8,NY,2015-09-25 00:00:04,I'm totally using this graphic some day... tha...
9,USA,2015-09-25 00:00:04,@MKnightOwl @Andromehda LMAO I honestly didn't..
Community
  • 1
  • 1
Leb
  • 15,483
  • 10
  • 56
  • 75

1 Answers1

3

To group a column by a frequency, you need to pass its name to the key parameter of the Grouper, like this:

df.groupby(pd.Grouper(key='TIMESTAMP', freq='30T'))

Edit:

See the Grouper docs for more - but in general, when you do groupby([a,b]) you are grouping by the unique combinations of a and b.

So in your example you were grouping by all the unique timestamp values (df['TIMESTAMP']) and a time grouper to the index (pd.TimeGrouper defaults to the index if no key is specified) - the TypeError was because your index was not datetimelike.

This also is why you were getting the large number of groups after setting the index to 'TIMESTAMP'.

chrisb
  • 49,833
  • 8
  • 70
  • 70
  • Yes, that works. I tried `df2.groupby([df2['TIMESTAMP'],pd.TimeGrouper(freq='H')])` it was giving me TypeError, ever after doing `df2['TIMESTAMP'] = pd.to_datetime(df2['TIMESTAMP']`. Any idea why? – Leb Oct 07 '15 at 01:24