Subtract Last and First Dates for a groupby

Question

I have a dataframe that looks something like this:

ID	Category	Site	Task Completed
1	A	X	1/2/22 12:00:00AM
1	A	X	1/3/22 12:00:00AM
1	A	X	1/1/22 12:00:00AM
1	A	X	1/2/22 1:00:00AM
1	B	Y	1/1/22 1:00:00AM
2	A	Z	1/2/22 12:00:00AM
2	A	Z	1/1/22 12:00:00AM

As you can see, there can be multiple task completed dates for a ID/Category/Site combo.

What I want to find is the time difference (in days) between the first (min) Task Completed date and the last (max) task completed date for every ID/Category/Site combination within the dataset. The intended result would look something like this:

ID	Category	Site	Time Difference
1	A	X	2
1	B	Y	0
2	A	Z	1

So far, I know that I have to change the 'task_completed' field to datetime and use groupby for each field which looks something like this:

df = pd.DataFrame(
[[1,'A','X','1/2/22 12:00:00AM'], 
[1,'A','X','1/3/22 12:00:00AM'], 
[1,'A','X','1/1/22 12:00:00AM'], 
[1,'A','X','1/2/22 1:00:00AM'], 
[1,'B','Y','1/1/22 1:00:00AM'],
[2,'A','Z','1/2/22 12:00:00AM'],
[2,'A','Z','1/1/22 12:00:00AM'], 
columns=['ID', 'Category', 'Site', 'Task Completed'])

df['task_completed'] = pd.to_datetime(df['task_completed'])

res = df.sort_values('task_completed').groupby(['id','site','category']).first()

But I'm not sure how to get the max then subtract to get the intended result.

Concerning posting examples of dataframes please check [How to make pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). — MagnusO_O, Oct 18 '22 at 17:38

INGl0R1AM0R1 · Answer 1 · 2022-10-18T17:55:39.310

1

I hope your task completed is on the datetime dtype

This will give you you wanted series, In this case, I apply a function for each groupby object, and use the dt acessor to check your month difference.

df['Month'] = df['Task Completed'].dt.day

df.groupby(['Category','ID','SITE'])['Month'].apply(lambda x : x.max()-x.min() if x.max()-x.min()>0 else 0)

edited Oct 18 '22 at 17:55

answered Oct 18 '22 at 17:44

INGl0R1AM0R1

1,532
5
16

I'm not looking for month difference, I am looking for days between the dates. Also, I want to group by all columns, not just 'category'. – CowboyCoder Oct 18 '22 at 17:53
Ah sorry about tha mate, not accustomed to the american datetimes. Just change the acessor to day. Also why would you groupby, all columns is that really necessary, you will stll only have a series as end result. Nonetheless just check edit – INGl0R1AM0R1 Oct 18 '22 at 17:55
1

@cowboycode If you are worried about the month, you can use the `day_of_year` function. Something like this `df['day_of_year'] = df['task_completed'].apply(lambda x: x.day_of_year)` – LazyClown Oct 18 '22 at 18:20

Subtract Last and First Dates for a groupby

1 Answers1