I have a large dataframe that I'm trying to combine date in one instance by minute and the other by 30 minutes.
df = pd.read_csv('2015-09-01.csv', header=None,\
names=['ID','CITY', 'STATE', 'TIMESTAMP','TWEET'], \
low_memory=False, \
parse_dates=['TIMESTAMP'], usecols=['STATE','TIMESTAMP','TWEET'])
Method 1
I have used this solution but if I try the following:
df = df2.groupby([df2.TIMESTAMP,pd.TimeGrouper(freq='H')])
It results in this error:
TypeError: axis must be a DatetimeIndex, but got an instance of 'Int64Index
which is very weird because TIMESTAMP
is being parsed in read_csv
Method 2
I tried setting TIMESTAMP
into index then doing:
df = df2.groupby([df2.index,pd.TimeGrouper(freq='H')])
However it's not coming up right as len(df) is 1350
rather than 24
since the dataframe as a whole is from 1 day worth of data.
Method 3
I used this solution but I'm not sure how to set it into 30 minute interval:
df = df2.groupby(df2['TIMESTAMP'].map(lambda x: x.hour))
Sample Data
STATE,TIMESTAMP,TWEET
0,TX,2015-09-25 00:00:01,Wish I could have gone to the game
1,USA,2015-09-25 00:00:01,PSA: @HaileyCassidyy and I are not related in...
2,USA,2015-09-25 00:00:02,If you gonna fail don't bring some one down wi...
3,NJ,2015-09-25 00:00:02,@_falastinia hol up hol up I can't listen to t...
4,USA,2015-09-25 00:00:02,"Wind 0.0 mph ---. Barometer 30.235 in, Rising ..."
5,NJ,2015-09-25 00:00:03,WHY ISNT GREYS ANATOMY ON?!
6,MI,2015-09-25 00:00:03,@cody_cole06 you bet it is
7,WA,2015-09-25 00:00:04,"Could be worse, I guess, could be in a collisi..."
8,NY,2015-09-25 00:00:04,I'm totally using this graphic some day... tha...
9,USA,2015-09-25 00:00:04,@MKnightOwl @Andromehda LMAO I honestly didn't..