0

I have a Series where one of the columns is 'trans_id'. I am trying to group the series by hour and minute and finally count the number of unique trans_ids in each group. I did the following:

>>print df.columns
>>Index([u'ts_gmt', u' src', u' dest', u' web', u' trans_id'],
      dtype='object')

>>data['ts_gmt'] = pd.to_datetime(df['ts_gmt'])
# convert a datetime col into row index
>>tsData = df.set_index('ts_gmt')


tsData['HOUR'] = tsData.index.hour
tsData['MINUTE'] = tsData.index.minute
tsData.groupby(['HOUR', 'MINUTE'])['trans_id'].apply(lambda x: len(x.unique()))

But, I get this error:

KeyError: 'Column not found: trans_id'

>>print tsData.columns
Index([u' src', u' dest', u' web', u' trans_id',
       u'HOUR', u'MINUTE'],
      dtype='object')

I am able to get nice groupings of hour and min if I do:

grps = tsData.groupby(['HOUR', 'MINUTE'])
print grps

But, unable to proceed after this. I found this link that is similar: How to count distinct values in a column of a pandas group by object?

Any suggestion is appreciated.

Community
  • 1
  • 1
user1717931
  • 2,419
  • 5
  • 29
  • 40
  • 1
    "I have a Series where one of the columns is 'trans_id'." <- Prove it. :-) Please edit the result of `print(tsData.columns)` into your question (and I think you must mean DataFrame not Series, because a Series doesn't have multiple columns.) – DSM Sep 03 '15 at 21:39
  • Basically, I had a dataframe where one of the columns had a timestamp value (as a text string). I converted that col into datetime, then set it as index. I was under the impression that if the row index is timestamp, then, it is basically a Series. – user1717931 Sep 03 '15 at 21:51
  • 1
    Note that `trans_id` is *not* one of the columns.. – DSM Sep 03 '15 at 21:55
  • @DSM: Can you please elaborate? Why do you say that 'trans_id' is not one of the columns. print tsData.columns clearly says so. Am I very mistaken by this command result? – user1717931 Sep 03 '15 at 21:59
  • Read more carefully. Compare `u'ts_gmt'` with `u' trans_id'`. Notice the difference? – DSM Sep 03 '15 at 22:00
  • Arggg! Thanks a lot. I just did this as well: print list(tsData.columns.values) and it is more clear and the missing space can be more easily identified. many thanks!!! Edited my input file (header) and things are fine now! – user1717931 Sep 03 '15 at 22:02

0 Answers0