The data as pandas dataframe df1:
id | name | year | month | day | hour | minute + identifier
--------------------------------------------------
1 | Steve | 2018 | 12 | 01 | 00 | 00
2 | Jeff | 2018 | 12 | 01 | 00 | 00
3 | Elon | 2018 | 12 | 01 | 00 | 00
4 | Satya | 2018 | 12 | 01 | 00 | 00
5 | Tim | 2018 | 12 | 01 | 00 | 00
6 | Mark | 2018 | 12 | 01 | 00 | 00
7 | Steve | 2018 | 12 | 01 | 00 | 15
8 | Jeff | 2018 | 12 | 01 | 00 | 15
9 | Elon | 2018 | 12 | 01 | 00 | 15
10 | Satya | 2018 | 12 | 01 | 00 | 15
11 | Tim | 2018 | 12 | 01 | 00 | 15
12 | Mark | 2018 | 12 | 01 | 00 | 15
13 | Steve | 2018 | 12 | 01 | 00 | 30
Another dataframe df2:
id | name | identifier
--------------------------
1 | Steve | ap1
2 | Jeff | am1
3 | Elon | te1
4 | Satya | ms1
5 | Tim | ap2
6 | Mark | fb1
A column has to be added to df1 by comparing it to a column of df2 with the names as identifiers. If the name match, the corresponding value has to be added.
My current solution:
for i in range(len(df1)):
_ = df1['name'].str.contains(name['name'].tolist()[i])
df_target['identifier'][_] = sws_stations['identifier'].tolist()[i]
At first I create a boolean series _ and assign row wise. This operation takes very long, because it's a dataset of million entries. I thought of using something more efficient, but I'm stuck. (I can only guess to use list comprehensions or dicts)