-1

I have a dataframe that looks something like this:

ID Category Site Task Completed
1 A X 1/2/22 12:00:00AM
1 A X 1/3/22 12:00:00AM
1 A X 1/1/22 12:00:00AM
1 A X 1/2/22 1:00:00AM
1 B Y 1/1/22 1:00:00AM
2 A Z 1/2/22 12:00:00AM
2 A Z 1/1/22 12:00:00AM

As you can see, there can be multiple task completed dates for a ID/Category/Site combo.

What I want to find is the time difference (in days) between the first (min) Task Completed date and the last (max) task completed date for every ID/Category/Site combination within the dataset. The intended result would look something like this:

ID Category Site Time Difference
1 A X 2
1 B Y 0
2 A Z 1

So far, I know that I have to change the 'task_completed' field to datetime and use groupby for each field which looks something like this:

df = pd.DataFrame(
[[1,'A','X','1/2/22 12:00:00AM'], 
[1,'A','X','1/3/22 12:00:00AM'], 
[1,'A','X','1/1/22 12:00:00AM'], 
[1,'A','X','1/2/22 1:00:00AM'], 
[1,'B','Y','1/1/22 1:00:00AM'],
[2,'A','Z','1/2/22 12:00:00AM'],
[2,'A','Z','1/1/22 12:00:00AM'], 
columns=['ID', 'Category', 'Site', 'Task Completed'])

df['task_completed'] = pd.to_datetime(df['task_completed'])

res = df.sort_values('task_completed').groupby(['id','site','category']).first()

But I'm not sure how to get the max then subtract to get the intended result.

Sunderam Dubey
  • 1
  • 11
  • 20
  • 40
CowboyCoder
  • 103
  • 5
  • 1
    Concerning posting examples of dataframes please check [How to make pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – MagnusO_O Oct 18 '22 at 17:38

1 Answers1

1

I hope your task completed is on the datetime dtype

This will give you you wanted series, In this case, I apply a function for each groupby object, and use the dt acessor to check your month difference.

df['Month'] = df['Task Completed'].dt.day

df.groupby(['Category','ID','SITE'])['Month'].apply(lambda x : x.max()-x.min() if x.max()-x.min()>0 else 0)
INGl0R1AM0R1
  • 1,532
  • 5
  • 16
  • I'm not looking for month difference, I am looking for days between the dates. Also, I want to group by all columns, not just 'category'. – CowboyCoder Oct 18 '22 at 17:53
  • Ah sorry about tha mate, not accustomed to the american datetimes. Just change the acessor to day. Also why would you groupby, all columns is that really necessary, you will stll only have a series as end result. Nonetheless just check edit – INGl0R1AM0R1 Oct 18 '22 at 17:55
  • 1
    @cowboycode If you are worried about the month, you can use the `day_of_year` function. Something like this `df['day_of_year'] = df['task_completed'].apply(lambda x: x.day_of_year)` – LazyClown Oct 18 '22 at 18:20