0

I have a DataFrame the looks like the following:

Inital DataFrame

I want to be able to calculate the time since a value changes in the "Project" and/or "Value" columns grouped by the "Name" column. How would I go about doing this? The output should be something like this (I don't care about the units):

Final DataFrame

Edit:

I think in my quest for brevity, I didn't completely explain the problem. I know this could be done with a for loop to check when a "Project" or "Value" changes and then compute the difference in datetime, but I was looking for a more vectorized approach. The real df I am working with is almost 1,000,000 rows and 1,000 columns. I made a bit of progress figuring out where a value changes (as .diff() doesn't work for strings) and the raw time changes with the following:

df["Changes"] = df.groupby("Name").apply(lambda x: x['Project'].ne(x['Project'].shift().bfill())).values

df['Day Delta'] = df.groupby("Name").apply(lambda x: (x.Date -x.Date.shift(1)).astype('timedelta64[D]')).fillna(0).values

Resulting in: enter image description here

I am still a bit lost figuring out how to translate this to the change in time since the last change.

  • [pandas.Timedelta](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timedelta.html#:~:text=Represents%20a%20duration%2C%20the%20difference,with%20it%20in%20most%20cases.) – BeanBagTheCat Jan 16 '21 at 17:47

2 Answers2

0

You can use this:

import datetime 

dt = datetime.datetime.now()
seq = int(dt.strftime("%Y%m%d"))

To transform your dates to integers. Then compare the values in the Project and Value column. If they are different, then calculate the difference between the dates (after you transformed the dates to integers) and you get your value.

DPM
  • 845
  • 7
  • 33
  • Thanks for the response! I don’t think I was completely clear with my question - see my edit. I am looking for a vectorized way of comparing and then calculating the difference. – Nathan Ballou Jan 18 '21 at 02:52
0

I think I finally figured out a way of doing this. I first modified the changes flag to give me a change on the first value:

df["Project Change"] = df.groupby("Name").apply(lambda x: x['Project'].ne(x['Project'].shift().ffill())).values

Yielding:

DataFrame with Changes

I then did a date difference when the value was true, grouped by the name:

df['Days Since Project Change'] = df['Date']-df['Date'].where(df["Project Change"]).groupby(df['Name']).ffill()

Resulting in:

DataFrame Final

Or, all together:

df.drop(columns = ["Project Change"], inplace = True)

df['Days Since Value Change'] = df['Date']-df['Date'].where(df.groupby("Name").apply(lambda x: x['Value'].ne(x['Value'].shift().ffill())).values).groupby(df['Name']).ffill()

enter image description here