Pandas: Efficiently compare two columns and assign match to new column

Question

The data as pandas dataframe df1:

id | name   | year | month | day | hour | minute    + identifier
--------------------------------------------------
  1 | Steve  | 2018 |    12 |  01 |   00 |   00
  2 | Jeff   | 2018 |    12 |  01 |   00 |   00
  3 | Elon   | 2018 |    12 |  01 |   00 |   00
  4 | Satya  | 2018 |    12 |  01 |   00 |   00
  5 | Tim    | 2018 |    12 |  01 |   00 |   00
  6 | Mark   | 2018 |    12 |  01 |   00 |   00
  7 | Steve  | 2018 |    12 |  01 |   00 |   15
  8 | Jeff   | 2018 |    12 |  01 |   00 |   15
  9 | Elon   | 2018 |    12 |  01 |   00 |   15
 10 | Satya  | 2018 |    12 |  01 |   00 |   15
 11 | Tim    | 2018 |    12 |  01 |   00 |   15
 12 | Mark   | 2018 |    12 |  01 |   00 |   15
 13 | Steve  | 2018 |    12 |  01 |   00 |   30

Another dataframe df2:

 id | name   | identifier  
--------------------------
  1 | Steve  |    ap1
  2 | Jeff   |    am1  
  3 | Elon   |    te1   
  4 | Satya  |    ms1  
  5 | Tim    |    ap2  
  6 | Mark   |    fb1

A column has to be added to df1 by comparing it to a column of df2 with the names as identifiers. If the name match, the corresponding value has to be added.

My current solution:

for i in range(len(df1)):
    _ = df1['name'].str.contains(name['name'].tolist()[i])
    df_target['identifier'][_] = sws_stations['identifier'].tolist()[i]

At first I create a boolean series _ and assign row wise. This operation takes very long, because it's a dataset of million entries. I thought of using something more efficient, but I'm stuck. (I can only guess to use list comprehensions or dicts)

Check `df1.merge(df2[['name', 'identifier']], on='name', how='left')` — Shubham Sharma, Feb 14 '21 at 09:26

Pandas: Efficiently compare two columns and assign match to new column

0 Answers0