Pandas create a frame whose entries are the value of a function applied to the respective entries of other DataFrames

Question

I have tree frames, lets say:

ID    A  B   | ID    A B  | ID    A B
john  *  1   | john  # 2  | john  @ 3
paul  1  1   | paul  2 2  | paul  3 3
jones 1  1   | jones 2 2  | jones 3 3

and I have to create a new dataframe where each entry is the result of a function whose arguments are the respective entries of the three frames

ID    A        B
john f(*,#,@) f(1,2,3)
...

I'm new to pandas and the only approach that I would know how to do is to turn the frames into numpy arrays and work on them like you would do on three matrices, but I would prefer to solve this problem the panda's way.

I already tried looking for other questions on SO but couldn't find anything, it is possible that that's due to how I've formulated my question.

How is `f()` written? Does it take three arrays or three scalars/objects? — ernest_k, Dec 30 '20 at 17:53

Danail Petrov · Accepted Answer · 2020-12-30T19:28:29.030

Not really sure exactly what is what you're doing, but here is something:

# Define dummy function (f)
def f(x):
    # you can use here x.name, x.A, x.B 
    # >>> x.name
    # 'paul'
    # >>> x.A
    # ['1', '2', '3']
    # >>> X.B
    # [1, 2, 3]
    return x

>>> df1
      ID  A  B
0   john  *  1
1   paul  1  1
2  jones  1  1

>>> df2
      ID  A  B
0   john  #  2
1   paul  2  2
2  jones  2  2

>>> df3
      ID  A  B
0   john  @  3
1   paul  3  3
2  jones  3  3

>>> pd.concat([df1,df2,df3]).groupby('ID').agg(list).apply(f, axis=1)
               A          B
ID
john   [*, #, @]  [1, 2, 3]
jones  [1, 2, 3]  [1, 2, 3]
paul   [1, 2, 3]  [1, 2, 3]

score 2 · Answer 2 · answered Dec 30 '20 at 18:00

import pandas as pd

If you have:

df0=pd.DataFrame.from_dict({'ID':['john','paul'],'A':1,'B':2})
df1=pd.DataFrame.from_dict({'ID':['john','paul'],'A':3,'B':4})
df2=pd.DataFrame.from_dict({'ID':['john','paul'],'A':5,'B':6})

Merge these 3 dataframes:

merged=df0.merge(df1, on='ID').merge(df2, on='ID')
merged.columns=['ID','A0','B0','A1','B1','A2','B2']

Define an example function f:

def f(a,b,c):
    return sum([a,b,c]) # for example

Create dataframe for result:

result=pd.DataFrame()
result['ID']=merged['ID']

Calculate A and B column in the result dataframe by applying the f function defined above:

result['A']=merged.apply(lambda row: f(row['A0'],row['A1'],row['A2']),axis=1)
result['B']=merged.apply(lambda row: f(row['B0'],row['B1'],row['B2']),axis=1)

result will be:

     ID  A   B
0  john  9  12
1  paul  9  12

Pandas create a frame whose entries are the value of a function applied to the respective entries of other DataFrames

2 Answers2