I have dataframe df
as below:
start_time end_time count
0 2020-02-03 08:42:21.997 2020-02-03 09:34:18.737 3116
1 2020-02-03 09:34:18.837 2020-02-03 10:16:56.583 2557
2 2020-02-03 10:17:00.480 2020-02-03 13:18:51.540 10911
3 2020-02-03 13:18:51.640 2020-02-03 14:01:23.263 2551
4 2020-02-03 14:01:23.363 2020-02-03 14:43:56.977 255
I would like to group by the date
only of the start_time
column and sum all corresponding count
values in the same day. I found a relevant answer from this post.
Using this method:
data.groupby(data.date.dt.year)
however, I received the error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-46-7618d5285bb9> in <module>()
1
----> 2 df.groupby(df.date.dt.year) # Adding ['start_time'] will return 'AttributeError: 'Series' object has no attribute 'date''.
3
4
5
/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in __getattr__(self, name)
5177 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5178 return self[name]
-> 5179 return object.__getattribute__(self, name)
5180
5181 def __setattr__(self, name, value):
AttributeError: 'DataFrame' object has no attribute 'date'
What is the problem and how can I group these non-unique datetime values in the start_time
column by date only and sum the values?
Edit:
In fact, I was able to do it with
import datetime
df['date'] = df['start_time'].dt.date # Group by 'date' of 'datetime' column
df.groupby('date').sum() # Sum
But I'd like to know if I could do it directly, probably something more straightforward like a one-liner as shown in the answer in the aforementioned post.