0

Comparing two Dataframes // Deciphering one Dataframe with another

Hello everyone and thanks for the help!

I have two dataframes. The first (df1) contains all my data, including a column with a long list of abbreviations (df1[ab]) which i want to translate into numbers via the second dataframe (df2). df2 contains two columns, one column with the same abbreviations (df2[key]) and one column with the related numbers (df2[value]).

My goal is to use the second dataframe as a deciphering tool for the first. I want to compare df1[ab] with df2[key] and create a new column in df1 which contains the correct numbers from df2[value] in the right order. Since the real list of abbreviations is quite long, i dont want to use a large number of "if-statements" to complete this task.

Example:

import pandas as pd
import numpy as np

abbreviations = ["Sl2","Sl4","Ss","Tu4","Slu","Su2/Su3","Ut2", "Ss","Sl2","Slu","Slu"]
dictab = {"ab": abbreviations}
df1 = pd.DataFrame(dictab)

key = ["Ss","Sl2","Sl3","Sl4","Slu"]
value = [11,25,27,30,33]
dictkv = {"key":key, "value":value}
df2 = pd.DataFrame(dictkv)

As a result, df1 should contain a new column df1[result] which should contain the following values in the following order:

print(df1)
         ab   result
0       Sl2       25 
1       Sl4       30
2        Ss       11
3       Tu4      NaN
4       Slu       33
5   Su2/Su3      NaN
6       Ut2      NaN
7        Ss       11
8       Sl2       25
9       Slu       33
10      Slu       33

Any help would be much appreciated!

Cheers, Jato

Jato
  • 61
  • 6

1 Answers1

1

If you just want the additional column:

df1.merge(df2, left_on='ab', right_on='key', how='left')

Output

         ab  key    value
0       Sl2  Sl2    25.0
1       Sl4  Sl4    30.0
2        Ss  Ss     11.0
3       Tu4  NaN     NaN
4       Slu  Slu    33.0
5   Su2/Su3  NaN     NaN
6       Ut2  NaN     NaN
7        Ss  Ss     11.0
8       Sl2  Sl2    25.0
9       Slu  Slu    33.0
10      Slu  Slu    33.0 

If you want the array of matches:

df1.merge(df2, left_on='ab', right_on='key', how='left').value.values

Output

array([25., 30., 11., nan, 33., nan, nan, 11., 25., 33., 33.])
Chris
  • 15,819
  • 3
  • 24
  • 37
  • Thanks a lot for your quick response! You solved my problem in a matter of minutes ^^ – Jato Sep 07 '20 at 16:26