I have a pandas
data frame as follows:
date | Item | count
------------------------------------
2016-12-06 10:45:08 | Item1 | 60
2016-12-06 10:45:08 | Item2 | 145
2016-12-06 09:45:00 | Item1 | 60
2016-12-06 09:44:54 | Item3 | 600
2016-12-06 09:44:48 | Item4 | 15
2016-12-06 11:45:08 | Item1 | 60
2016-12-06 10:45:08 | Item2 | 14
2016-11-06 09:45:00 | Item1 | 62
2016-11-06 09:44:54 | Item3 | 6
2016-11-06 09:44:48 | Item4 | 15
I am trying to groupby the Items by let's say hour of the day (or later just day) to know the following statistics: list of items sold per day, such as:
- On
2016-12-06
, from09:00:00
to10:00:00
, Item1 , Item3 and Item4 were sold; and so on. - On
2016-12-06
, Item1, Item2, Item3, Item4 (unique items) were sold.
While I am far away from fetching these statistics, I am stuck with grouping by time.
Initially, the print df.dtypes
showed
date object
Item object
count int64
dtype: object
So, I used the following line of code to convert the date column to a pandas date object.
df['date'] = pd.to_datetime(df['date'])
and now, print df.dtypes
yields:
date datetime64[ns]
Item object
count int64
dtype: object
However, when I try to groupby the date
column using TimeGrouper
by executing the following lines of code
from pandas.tseries.resample import TimeGrouper
print df.groupby([df['date'],pd.TimeGrouper(freq='Min')])
I get the following TypeError
. As per the suggestions given here or here, converting using pd.to_datetime should have resolved this issue.
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
I have no clue how to solve this issue to proceed to the statistics I am looking for. Any tips on resolving this error and using TimeGrouper to seek the statistics preferably in a dictionary format (or anything that makes more sense) would be much appreciated.