0

I am working on two data-frames which have different column names and dimensions.

First data-frame "df1" contains single column "name" that has names need to be located in second data-frame. If matched, value from df2 first column df2[0] needs to be returned and added in the result_df

Second data-frame "df2" has multiple columns and no header. This contains all the possible diminutive names and full names. Any of the column can have the "name" that needs to be matched

Goal: Locate the name in "df1" in "df2" and if it is matched, return the value from first column of the df2 and add in the respective row of df1

df1

name
ab
alex
bob
robert
bill

df2

0 1 2 3
abram ab
robert rob bob robbie
alexander alex al
william bill

result_df

name matched_name
ab abram
alex alexander
bob robert
robert robert
bill william

The code i have written so far is giving error. I need to write it as an efficient code as it will be checking millions of entries in df1 with df2:

''' result_df = process_name(df1, df2)

def process_name(df1, df2):

for elem in df2.values:
    
    if elem in df1['name']:
        df1["matched_name"] = df2[0]

'''

Samy
  • 47
  • 7

1 Answers1

1

Try via concat(),merge(),drop() and rename() and reset_index() method:

df=(pd.concat((df1.merge(df2,left_on='name',right_on=x) for x in df2.columns))
    .drop(['1','2','3'],1)
    .rename(columns={'0':'matched_name'})
    .reset_index(drop=True))

Output of df:

    name    matched_name
0   robert  robert
1   ab      abram
2   alex    alexander
3   bill    william
4   bob     robert
Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41
  • 1
    Thanks, it worked. Rename wasn't working because of different datatypes so I converted the datatype. – Samy May 26 '21 at 05:24
  • @Samy ohh...that's great.....btw if this answer helped you then try considering accepting the answer to indicate others that the issue is resolves....Thanks **:)** – Anurag Dabas May 26 '21 at 06:21