0
testdf = pd.DataFrame({'mth':[1,1,1,1,1,1,1,1],'dy':[1,1,1,1,1,1,1,1],'id':[1,2,5,6,7,8,9,10], 'frame':['Fo','Fm','Fp','Fmp','Fp','Fmp','Fp','Fmp'],'param':['p1','p1','p1','p1','p2','p2','p3','p3'], 'avg':[0.1,0.25,0.1,0.25,0.08,0.26,0.05,0.25]}).set_index(['mth','dy'])
testdf2 = pd.DataFrame({'mth':[1,1,1,1,1,1,1,1],'dy':[2,2,2,2,2,2,2,2], 'id':[1,2,5,6,7,8,9,10], 'frame':['Fo','Fm','Fp','Fmp','Fp','Fmp','Fp','Fmp'], 'param':['p1','p1','p1','p1','p2','p2','p3','p3'],'avg':[0.1,0.25,0.1,0.25,0.08,0.26,0.05,0.25]}).set_index(['mth','dy'])
mydf = pd.concat([testdf,testdf2])

I have some data in long format and need to compute new values for each param based on pairs of frame. In the apply(print) statement below you can see the relationship betwee frame and param. unfortunately for my first param I have duplicated data - perhaps it would be easiest to remove rows corresponding to id 5 and 6.

mydf.groupby(['mth','dy','param']).apply(print)

        id frame param   avg
mth dy                      
1   1    1    Fo    p1  0.10
    1    2    Fm    p1  0.25
    1    5    Fp    p1  0.10
    1    6   Fmp    p1  0.25
        id frame param   avg
mth dy                      
1   1    7    Fp    p2  0.08
    1    8   Fmp    p2  0.26
        id frame param   avg
mth dy                      
1   1    9    Fp    p3  0.05
    1   10   Fmp    p3  0.25

one approach i've considered is

mydf = mydf.query('(param!="p1") | (param=="p1" & (frame=="Fo" | frame=="Fm"))')

mydf.groupby(['mth','dy','param']).apply(lambda x: x.iloc[1].avg - x.iloc[0].avg)

but if your data gets out of order somehow you'd get the wrong answer. I'd like to use pivot or a related function so I can use something to the effect of Fm - Fo or Fmp - Fp and access the values by name. However, I can't seem to find a solution that appends a new column param_val to the existing dataframe where the param_val is duplicated for each frame.

at the very least, I think I'd need to sort it by p1 and not p1, eg. mydf.query('param=="p1"') and mydf.query('param!="p1"') before pivoting but then there should be a fairly concise columnar operation that eludes me.

I did look at Pivot duplicates rows into new columns Pandas but haven't wrapped my head around it. pivoting in pandas is super complicated compared to R! your help is appreciated. thanks!

Dominik
  • 782
  • 7
  • 27
  • your issue is that the avg value is duplicated? Why not include it in your grouping - `mydf.groupby(['mth','dy','param','avg']).first()` – asongtoruin Jan 17 '19 at 15:52
  • if you group `avg` then you can't pivot and compute a new column with it.... – Dominik Jan 17 '19 at 16:19
  • I'm not entirely sure what you mean, but if you set `as_index=False` on your `groupby` operation you can still apply any functions to the avg column. Could you be a little clearer about what you expect the output to look like? – asongtoruin Jan 17 '19 at 16:55

0 Answers0