I have a DataFrame the looks like the following:
I want to be able to calculate the time since a value changes in the "Project" and/or "Value" columns grouped by the "Name" column. How would I go about doing this? The output should be something like this (I don't care about the units):
Edit:
I think in my quest for brevity, I didn't completely explain the problem. I know this could be done with a for loop to check when a "Project" or "Value" changes and then compute the difference in datetime, but I was looking for a more vectorized approach. The real df I am working with is almost 1,000,000 rows and 1,000 columns. I made a bit of progress figuring out where a value changes (as .diff() doesn't work for strings) and the raw time changes with the following:
df["Changes"] = df.groupby("Name").apply(lambda x: x['Project'].ne(x['Project'].shift().bfill())).values
df['Day Delta'] = df.groupby("Name").apply(lambda x: (x.Date -x.Date.shift(1)).astype('timedelta64[D]')).fillna(0).values
I am still a bit lost figuring out how to translate this to the change in time since the last change.