I have two DFs. I want to iterate through rows in DF1 and filter all the rows in DF2 with same id and get column"B" value in new columns of DF1.
data = {'id': [1,2,3]}
df1 = pd.DataFrame(data)
data = {'id': [1, 1, 3,3,3], 'B': ['ab', 'bc','ad','ds','sd']}
df2 = pd.DataFrame(data)
DF1 - id (15k rows) DF2 - id, col1 (50M rows)
Desired output
data = {'id': [1,2,3],'B':['[ab,bc]','[]','[ad,ds,sd]']}
pd.DataFrame(data)
def func(df1):
temp3=df2.merge(pd.DataFrame(data=[df1.values]*len(df1),columns=df1.index),how='right',on=
['id'])
temp1 = temp3.B.values
return temp1
df1['B']=df1.apply(func,axis=1))
I am using merge for filtering and applying lambda function on df1. The code is taking 1 hour to execute on large data frame. How to make this run faster ?