0

I have two dataframes. I would like to loop through both dataframes. When the signal name in df_A[‘SignalName’] matches the id in df_B[‘id’], I would like to add the corresponding df_A[‘Translation’] as a new column in df_B.

I have found similar questions, but cannot seem to find a solution to this one.

Here are my dfs:

df_A:

dict_A = {'Location':['Loc1', 'Loc1', 'Loc1', 'Loc1'],
    'LocationII': ['Center', 'Center', 'Center', 'Center'], 
    'SignalName':['IV.701', 'IV.702', 'E.50', 'IV.703'],
   'Translation':['Signal A', 'Signal B', 'Signal C', 'Signal D']} 

df_A = pd.DataFrame(dict_A)  
print(df_A)

  Location LocationII SignalName Translation
0     Loc1     Center     IV.701    Signal A
1     Loc1     Center     IV.702    Signal B
2     Loc1     Center       E.50    Signal C
3     Loc1     Center     IV.703    Signal D

df_B:

dict_B = {'id':['RV.702', 'RV.74', 'E.50', 'AV.233'],
    'Code': ['val', 'val', 'val', 'val'], 
    'Inst':['Inst', 'Enit', 'Enit', 'Inst']} 

df_B = pd.DataFrame(dict_B)  
print(df_B)

       id Code  Inst
0  RV.702  val  Inst
1   RV.74  val  Enit
2    E.50  val  Enit
3  AV.233  val  Inst

This is my desired result:

       id Code Inst Translation
0  RV.702 val  Inst
1  RV.74  val  Enit
2  E.50   val  Enit Signal C
3  AV.233 val  Inst

I started by trying to create a new df where I see my matches:

df_comp=df_B.join(df_A, [df_B ['id'] == df_A['SignalName']], how='inner')

this results in error: 'ValueError: Can only compare identically-labeled Series objects' I am not sure why. Do I need to rename the key of on of the dfs?

Anyway, I realized this isn’t a good path to a nice solution. I probably need to directly access the index of the match in both dfs, and I am not sure how to do this.

Miss.Pepper
  • 103
  • 10
  • There are a lot of answers on SO for similar questions. You can use [DataFrame.merge](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html#pandas-dataframe-merge), df_B.merge(df_A[['SignalName', 'Translation']], left_on = 'id', right_on = 'SignalName', how = 'left').drop(['SignalName'],1).fillna('') – Vaishali Nov 06 '20 at 22:58
  • See also: [What is the difference between join and merge in Pandas?](https://stackoverflow.com/questions/22676081). – Bill Huang Nov 06 '20 at 23:01

1 Answers1

0
  1. Use set_index() on the two columns you want to join on with df_A = df_A.set_index('SignalName') and df_B = df_B.set_index('id')
  2. Only bring in the one column from the other dataframe with df_B.join(df_A[['Translation']]):

dict_A = {'Location':['Loc1', 'Loc1', 'Loc1', 'Loc1'],
    'LocationII': ['Center', 'Center', 'Center', 'Center'], 
    'SignalName':['IV.701', 'IV.702', 'E.50', 'IV.703'],
   'Translation':['Signal A', 'Signal B', 'Signal C', 'Signal D']} 

df_A = pd.DataFrame(dict_A)
df_A = df_A.set_index('SignalName')

dict_B = {'id':['RV.702', 'RV.74', 'E.50', 'AV.233'],
    'Code': ['val', 'val', 'val', 'val'], 
    'Inst':['Inst', 'Enit', 'Enit', 'Inst']} 
df_B = pd.DataFrame(dict_B)
df_B = df_B.set_index('id')

df_comp = df_B.join(df_A[['Translation']])
df_comp
Out[1]: 
       Code  Inst Translation
id                           
RV.702  val  Inst         NaN
RV.74   val  Enit         NaN
E.50    val  Enit    Signal C
AV.233  val  Inst         NaN

As a final step, you could also do: df_comp = df_comp.reset_index()

David Erickson
  • 16,433
  • 2
  • 19
  • 35