Group non-unique datetime column by date and sum values in python

Question

I have dataframe df as below:

         start_time                 end_time        count
0   2020-02-03 08:42:21.997 2020-02-03 09:34:18.737 3116
1   2020-02-03 09:34:18.837 2020-02-03 10:16:56.583 2557
2   2020-02-03 10:17:00.480 2020-02-03 13:18:51.540 10911
3   2020-02-03 13:18:51.640 2020-02-03 14:01:23.263 2551
4   2020-02-03 14:01:23.363 2020-02-03 14:43:56.977 255

I would like to group by the date only of the start_time column and sum all corresponding count values in the same day. I found a relevant answer from this post.

Using this method:

data.groupby(data.date.dt.year)

however, I received the error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-46-7618d5285bb9> in <module>()
      1 
----> 2 df.groupby(df.date.dt.year)      # Adding ['start_time'] will return 'AttributeError: 'Series' object has no attribute 'date''.
      3 
      4 
      5 

/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in __getattr__(self, name)
   5177             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5178                 return self[name]
-> 5179             return object.__getattribute__(self, name)
   5180 
   5181     def __setattr__(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'date'

What is the problem and how can I group these non-unique datetime values in the start_time column by date only and sum the values?

Edit:

In fact, I was able to do it with

import datetime 
df['date'] = df['start_time'].dt.date       # Group by 'date' of 'datetime' column
df.groupby('date').sum()                    # Sum

But I'd like to know if I could do it directly, probably something more straightforward like a one-liner as shown in the answer in the aforementioned post.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html , by: mapping function, labels or list of labels — ansev, Mar 13 '20 at 00:07
Please provide the entire error message. What did you except `data.date` to do? — AMC, Mar 13 '20 at 00:17

Fish11 · Accepted Answer · 2020-03-13T18:58:26.260

1

Super close, datetime.dt.date is how you access just the date potion of the datetime object (https://www.geeksforgeeks.org/python-pandas-series-dt-date/). Try:

data.groupby(data["start_time"].dt.date)["count"].sum()

Here is some background information about the indexing that I think you're missing:

When we write data["start_time"], we are getting column start_time from your dataframe data. An equivalent way of getting this column is to use data.start_time. When you try to access data.date (which is equivalent to data["date"]), we get an attribute error because your dataframe data does not have a column called date.

If the start_time column is of type datettime then it has an attribute called dt which has the attribute date which is what we are wanting to group by. We can access this through data.start_time.dt.date or data["start_time"].dt.date.

When you write data["date"] = data["start_time"], you are creating a new column in your dataframe called date which is equal to your start_time column. You can now access it through data.date (or data["date"]) which is why your solution works.

edited Mar 13 '20 at 18:58

answered Mar 13 '20 at 00:01

Fish11

453
3
12

you need explain you can pass a mapping function in groupby – ansev Mar 13 '20 at 00:08
Thanks. It worked:) Any idea how the error came about? – nilsinelabore Mar 13 '20 at 00:11
column date is not in your dataframe – ansev Mar 13 '20 at 00:12
yes, you were trying to access the date column from the dataframe with `data.date` which is equivalent to `data['date']` – Fish11 Mar 13 '20 at 00:13
first you get start_time series after you acces to the date using dt.date ... – ansev Mar 13 '20 at 00:13
Should I use `df.groupby(df['start_time'].date.dt.date)` then? The code returned `AttributeError: 'Series' object has no attribute 'date'` – nilsinelabore Mar 13 '20 at 00:20
I've updated the answer to give a little more background behind how this is actually working – Fish11 Mar 13 '20 at 18:59

Group non-unique datetime column by date and sum values in python

1 Answers1