Apply a function to rows within a DataFrame

Question

Assume I have the following df:

df = pd.DataFrame({'A': [120,108.6], 'B': [109, 147]})

Assume I have the following function:

def cpt_p(A, B):
    n = np.arange(1, B+1)
    p = [A] * B # Creates a repeating value of A of length B i.e. [A, A, A, ...]
    return p * n

Could someone show how I'd apply this to my df? The following does not work:

df['C'] = df.apply(cpt_p(df[0], df[1]), axis=1)

Don't forget good old list comprehension `df["C"] = [cpt_p(x,y) for x,y in zip(df["A"], df["B"])]`. — Henry Yik, Jun 27 '21 at 15:44

SeaBean · Answer 1 · 2021-06-27T15:31:54.513

You can use lambda function within .apply() and access the column values by syntax like x['A'] for column A values, etc. For each function parameter, just put the corresponding x['A'], x['B'] at the correct position of the function call, like cpt_p(x['A'], x['B']) for passing value of column A as first parameter and value of column B as second parameter (for each row):

def cpt_p(A, B):
    p = [A] * int(B) # Creates a repeating value of A of length B i.e. [A, A, A, ...]
    return p


df['C'] = df.apply(lambda x: cpt_p(x['A'], x['B']), axis=1)

Another method:

Another method to apply a function to rows is by list(map()), like below:

df['C'] = list(map(cpt_p, df['A'], df['B']))

You can put the function as the first parameter to map() function and pass the function parameters as second parameter onwards to map().

The advantage of using list(map()) is that it is generally faster than using apply() on axis=1. Could be more than 3x times faster execution time. You can see this post for the execution time comparison for some use cases.

Result:

print(df)

       A    B                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  C
0  120.0  109  [120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, ...]
1  108.6  147  [108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, 108.6, ...]

score 0 · Answer 2 · 2021-06-27T14:18:53.973

0

This is a much simpler way to go about it.

df["C"] = df["B"].apply(lambda x: np.ones(x))*df["A"]

edited Jun 27 '21 at 14:18

answered Jun 27 '21 at 14:12

score 0 · Answer 3 · answered Jun 27 '21 at 14:13

In case you get the following Type error:

TypeError: can't multiply sequence by non-int of type 'float'

There is a minor change required in the cpt_p function. The value that list gets multiplied to should be an integer.

import pandas as pd 
import numpy as np 

df = pd.DataFrame({'A': [120,108.6], 'B': [109, 147]})

def cpt_p(A, B):
    n = np.arange(1, B+1)
    p = [A] * int(B) # Creates a repeating value of A of length B i.e. [A, A, A, ...]
    return p * n


df['C'] = df.apply(lambda x: cpt_p(x['A'],x['B']), axis=1)

print(df)

Apply a function to rows within a DataFrame

3 Answers3