I have a Series where one of the columns is 'trans_id'. I am trying to group the series by hour and minute and finally count the number of unique trans_ids in each group. I did the following:
>>print df.columns
>>Index([u'ts_gmt', u' src', u' dest', u' web', u' trans_id'],
dtype='object')
>>data['ts_gmt'] = pd.to_datetime(df['ts_gmt'])
# convert a datetime col into row index
>>tsData = df.set_index('ts_gmt')
tsData['HOUR'] = tsData.index.hour
tsData['MINUTE'] = tsData.index.minute
tsData.groupby(['HOUR', 'MINUTE'])['trans_id'].apply(lambda x: len(x.unique()))
But, I get this error:
KeyError: 'Column not found: trans_id'
>>print tsData.columns
Index([u' src', u' dest', u' web', u' trans_id',
u'HOUR', u'MINUTE'],
dtype='object')
I am able to get nice groupings of hour and min if I do:
grps = tsData.groupby(['HOUR', 'MINUTE'])
print grps
But, unable to proceed after this. I found this link that is similar: How to count distinct values in a column of a pandas group by object?
Any suggestion is appreciated.