testdf = pd.DataFrame({'mth':[1,1,1,1,1,1,1,1],'dy':[1,1,1,1,1,1,1,1],'id':[1,2,5,6,7,8,9,10], 'frame':['Fo','Fm','Fp','Fmp','Fp','Fmp','Fp','Fmp'],'param':['p1','p1','p1','p1','p2','p2','p3','p3'], 'avg':[0.1,0.25,0.1,0.25,0.08,0.26,0.05,0.25]}).set_index(['mth','dy'])
testdf2 = pd.DataFrame({'mth':[1,1,1,1,1,1,1,1],'dy':[2,2,2,2,2,2,2,2], 'id':[1,2,5,6,7,8,9,10], 'frame':['Fo','Fm','Fp','Fmp','Fp','Fmp','Fp','Fmp'], 'param':['p1','p1','p1','p1','p2','p2','p3','p3'],'avg':[0.1,0.25,0.1,0.25,0.08,0.26,0.05,0.25]}).set_index(['mth','dy'])
mydf = pd.concat([testdf,testdf2])
I have some data in long format and need to compute new values for each param
based on pairs of frame
. In the apply(print) statement below you can see the relationship betwee frame and param. unfortunately for my first param
I have duplicated data - perhaps it would be easiest to remove rows corresponding to id 5 and 6.
mydf.groupby(['mth','dy','param']).apply(print)
id frame param avg
mth dy
1 1 1 Fo p1 0.10
1 2 Fm p1 0.25
1 5 Fp p1 0.10
1 6 Fmp p1 0.25
id frame param avg
mth dy
1 1 7 Fp p2 0.08
1 8 Fmp p2 0.26
id frame param avg
mth dy
1 1 9 Fp p3 0.05
1 10 Fmp p3 0.25
one approach i've considered is
mydf = mydf.query('(param!="p1") | (param=="p1" & (frame=="Fo" | frame=="Fm"))')
mydf.groupby(['mth','dy','param']).apply(lambda x: x.iloc[1].avg - x.iloc[0].avg)
but if your data gets out of order somehow you'd get the wrong answer. I'd like to use pivot
or a related function so I can use something to the effect of Fm - Fo or Fmp - Fp and access the values by name.
However, I can't seem to find a solution that appends a new column param_val
to the existing dataframe where the param_val
is duplicated for each frame
.
at the very least, I think I'd need to sort it by p1
and not p1
, eg. mydf.query('param=="p1"')
and mydf.query('param!="p1"')
before pivoting but then there should be a fairly concise columnar operation that eludes me.
I did look at Pivot duplicates rows into new columns Pandas but haven't wrapped my head around it. pivoting in pandas is super complicated compared to R! your help is appreciated. thanks!