0

I'm trying to match two dataframes by the strings they have in common. This is the code for example:

import pandas as pd
import numpy as np
A = ['DF-PI-05', 'DF-PI-09', 'DF-PI-10', 'DF-PI-15', 'DF-PI-16',
       'DF-PI-19', 'DF-PI-89', 'DF-PI-92', 'DF-PI-93', 'DF-PI-94',
       'DF-PI-95', 'DF-PI-96', 'DF-PI-25', 'DF-PI-29', 'DF-PI-30',
       'DF-PI-34', 'DF-PI-84']

B = ['PI-05', 'PI-10', 'PI-89', 'PI-90', 'PI-93', 'PI-94', 'PI-95',
       'PI-96', 'PI-09', 'PI-15', 'PI-16', 'PI-19', 'PI-91A', 'PI-91b',
       'PI-92', 'PI-25-CU', 'PI-29', 'PI-30', 'PI-34', 'PI-84-CU-S1',
       'PI-84-CU-S2']


A = pd.DataFrame(A,columns=['ID'])
B = pd.DataFrame(B,columns=['Name'])
B['param_2'] = np.linspace(20,300,21)
A['param_1'] = np.linspace(0,20,17)

I would like to have ONE pandas dataframe with the shape (21,3) where the columns are ['Name','param_1',param_2]. where if the column 'ID' from pandas dataframe B matches the column 'Name' from pandas dataframe A, I get the values from the other columns in A and B, if the 'Name' doesn't match, I have NaN.

I have tried the code below but doesn't work.

B['param_1'] = np.nan
for date_B, row_B in B.iterrows():
    
    for date_A, row_A in A.iterrows():
        if row_B['Name'] in row_A['ID']:
            row_B['param_1'] = row_A['param_1']


            break

I also would prefer some solution without for loops, because I have a large dataset, perhaps with pandas.isin() function, I have tried but didn't manage to get the right result. Something like in this example:

Join dataframes based on partial string-match between columns

JCV
  • 447
  • 1
  • 5
  • 15

1 Answers1

1

Try left join:

A["_ID"] = A.ID.str[3:]
pd.merge(B,A, how="left", left_on="Name", right_on="_ID")[['Name','param_1', 'param_2']]
s510
  • 2,271
  • 11
  • 18
  • This code doesn't work, it gives NaN in column 'param_1' instead of picking the value from dataframe A – JCV Sep 13 '22 at 14:35
  • this is because your data in A and B do not match at all. e.g. in A it is `DF-PI-05` and in B it is `PI-05` – s510 Sep 13 '22 at 14:38
  • That's the question, because of that I was trying a for loop with 'in'. If a subset of the string in the Name/ID match, I need that value – JCV Sep 13 '22 at 15:03
  • @JenifferBarreto updated, try this now. – s510 Sep 13 '22 at 16:56
  • Thank you. That works in this specific case when the matching is the end of the ID. If the word to match is somewhere in the middle, will not work. I was wondering if would be possible to do it with the pandas.isin function – JCV Sep 14 '22 at 10:26