So I have the following data frame:
In [1]:
import pandas as pd
import numpy as np
df = pd.DataFrame([['0068edf090ceaf1356', '0068edf090ceaf1356', '0068edf090ceaf1356','0068edf090ceaf1356', '0068edf090ceaf1356', '009eft67eaa133cea4', '009eft67eaa133cea4', '009eft67eaa133cea4', '009eft67eaa133cea4'], [-26, -26 -36, 81, 181, -51, -81, 61, 71], [18.0, 27.0, 53.0, 43.6, 12.4, 24.4, 63.0,72.8]], columns = ['uuid', 'prom_completed_date', 'prom_score'])
In [2]: df
Out[2]:
uuid prom_completed_date prom_score
0068edf090ceaf1356 -26 18.0
0068edf090ceaf1356 -26 18.0
0068edf090ceaf1356 -36 27.0
0068edf090ceaf1356 81 53.0
0068edf090ceaf1356 181 43.6
009eft67eaa133cea4 -51 12.4
009eft67eaa133cea4 -81 24.4
009eft67eaa133cea4 61 63.0
009eft67eaa133cea4 71 72.8
Where each patient has multiple entries. Bearing in mind that the first two entries are not duplicates but are the same as there are other columns with different options so rather than the pre-op average being (18+18+27)/3 it should be (18+27)/2.
I want to create a new data frame where each uuid has three new columns:
- an average PROM score where the values in the prom_completed_date_relative are negative
- an average PROM score where the values in the prom_completed_date_relative are positive
- the difference between the two above averages.
I'm not exactly sure how to do the coding for this in python, whilst ensuring that the uuid's are grouped.
I'm looking for something like this:
In [3]:
Out[3]:
uuid postop_avgPROM preop_avgPROM difference
0068edf090ceaf1356 48.3 22.5 25.8
009eft67eaa133cea4 67.9 18.4 49.5
I have tried the following:
df.query("prom_completed_date_relative">0).groupby("uuid")["prom_score"].mean().reset_index(name="postop_avgPROM_score")
but it does not seem to work, unfortunately.
Thank you.