0

I am trying to apply a custom function that takes two arguments to certain two columns of a group by dataframe.

I have tried with apply and groupby dataframe but any suggestion is welcome.

I have the following dataframe:

    id    y       z
    115  10      820
    115  12      960
    115  13     1100
    144  25     2500
    144  55     5500
    144  65      960
    144  68     6200
    144  25     2550
    146  25     2487
    146  25     2847
    146  25     2569
    146  25     2600
    146  25     2382

And I would like to apply a custom function with two arguments and get the result by id.

def train_logmodel(x, y):
##.........
    return x


data.groupby('id')[['y','z']].apply(train_logmodel)


TypeError: train_logmodel() missing 1 required positional argument: 'y'

I would like to know how to pass 'y' and 'z' in order to estimate the desired column 'x' by each id.

The expected output example:

       id   x
      115 0.23
      144 0.45
      146 0.58

It is a little different from the question: How to apply a function to two columns of Pandas dataframe

In this case we have to deal with groupby dataframe which works slightly different than a dataframe.

Thanks in advance!

G_Fabian
  • 13
  • 4
  • Possible duplicate of [How to apply a function to two columns of Pandas dataframe](https://stackoverflow.com/questions/13331698/how-to-apply-a-function-to-two-columns-of-pandas-dataframe) – SkippyNBS Jul 09 '19 at 20:18

1 Answers1

0

Not knowing your train_logmodel function, I can only give a general example here. Your function takes one argument, from this argument you get the columns inside your function:

def train_logmodel(data): 
    return (data.z / data.y).min()

df.groupby('id').apply(train_logmodel)

Result:

id
115    80.000000
144    14.769231
146    95.280000
Stef
  • 28,728
  • 2
  • 24
  • 52