1

I have a data frame sorted by IDs in long format. Most IDs have more than one row, and all rows have a date. I want to calculate the difference between dates in consecutive rows within each ID.

I've tried using a groupby object in Pandas, and pivoting the data to a wide format, but haven't had success. The set up is below. (Sorry, I couldn't figure out how to post the console output of the set up code below.)

The integers in the date columns are stand ins for the dates. I know how to work with dates, so don't need help there. The code should calculate the difference in dates between consecutive rows within an ID and put the difference in a new column called 'difference' (i.e., it should "start over" when it gets to the next ID). The first row in each ID will not have an entry for difference because there is no difference to calculate. The second should be the difference between dates in the first row and the second row within an ID, etc.

df = pd.DataFrame({'ID': [1,1,2,2,2,2,3,3,3],
'action': ['first', 'end', 'first', 'change', 'change', 
'last','first','change', 'end'],
'date': [1, 2, 2, 4, 6, 8, 1, 2, 9],
'movement': [1,0,1,1,1,0,1,1,0],})

Here is an image of the dataframe from my console: Example of Dataframe

Code to generate desired output is below:

desiredOutput = pd.DataFrame({'ID': [1,1,2,2,2,2,3,3,3],
'action': ['first', 'end', 'first', 'change', 'change', 
'last','first','change', 'end'],
'date': [1, 2, 2, 4, 6, 8, 1, 2, 9],
'movement': [1,0,1,1,1,0,1,1,0], 'difference':[0,1,0,2,2,2,0,1,7]})
jpp
  • 159,742
  • 34
  • 281
  • 339
macksimal
  • 11
  • 3

1 Answers1

2

This is a groupby problem. You can use GroupBy.diff, remembering to replace null values with 0 and convert to int:

df['difference'] = df.groupby('ID')['date'].diff().fillna(0).astype(int)

print(df)

#    ID  action  date  movement  difference
# 0   1   first     1         1           0
# 1   1     end     2         0           1
# 2   2   first     2         1           0
# 3   2  change     4         1           2
# 4   2  change     6         1           2
# 5   2    last     8         0           2
# 6   3   first     1         1           0
# 7   3  change     2         1           1
# 8   3     end     9         0           7
jpp
  • 159,742
  • 34
  • 281
  • 339