I have a timeseries dataset containing scores on scales of depression, anxiety, and trauma for patients. Data was collected at 6 time points for each patient.
mh_data.head(10)
ID BDI GAD TSQ age
1 57 9 4 22
1 36 9 4 22
1 37 9 4 22
1 38 7 3 22
1 41 8 3 22
1 39 7 3 22
2 29 14 7 35
2 27 12 6 35
2 27 11 6 35
2 23 11 3 35
I want to create a new dataset where each patient has only 1 corresponding value for each of the variables which represents the difference between the first last and recorded data point. So, it will look like this:
ID BDI GAD TSQ age
1 18 2 1 22
2 1 0 2 35
. . . . .
. . . . .
. . . . .
I've grouped the data and aggregated by first and last scores:
mhs_agg = mhs_data.groupby("ID").agg(['first','last'])
How can I proceed or is there a more efficient way of doing this? I also have age which is a variable I don't want to be computing the difference for (as this will come out as 0 for everyone).
I've seen all of the following posts and none of the suggestions seem to work for my specific case.
How to apply "first" and "last" functions to columns while using group by in pandas?
Python/Pandas - Aggregating dataframe with first/last function without grouping