Assign values to a new column from different datafram if column list contains substring

Question

I have two dataframes:

df1 = pd.DataFrame(list(zip(['name1, Name2, name5', 'name4, name3', 'name6xx'],
                            [150, 230, 'name6xx'])),
                    columns=['name', 'compound1'])
df1

df2 = pd.DataFrame(list(zip(['name1 ppl', 'PPL name2', 'Name3 PPL', 'name4 ppl', 'name5 ppl', 'name6xx'])), columns=['name'])
df2

How can I assign values to df2 based on list ofdf1.name?

I want a table like this:

df2 = pd.DataFrame(list(zip(['name1 ppl', 'PPL name2', 'Name3 PPL', 'name4 ppl', 'name5 ppl', 'name 6xx'],
                           [150,150,230,230,150,'name6xx'])),
                    columns=['name', 'compound'])
df2

Here is my code:

def match(name):
    cond = df1['name'].str.contains(name, case=False)[0]
    if cond:
        return df1.loc[cond, 'compound1'].values[0]
    else:
        return name
df2.compound_new = df2.name.apply(match)

score 1 · Accepted Answer · answered Nov 03 '22 at 13:01

1

You can craft a regex, extract the values and map the matches:

import re

s = (df1.assign(name=df1['name'].str.lower().str.split(',\s*'))
        .explode('name').set_index('name')['compound1']
    )

regex = '|'.join(map(re.escape, s.index))
# 'name1|name2|name3|name4|name5|name6xx'

df2['compound'] = (df2['name'].str.lower()
                   .str.extract(f'({regex})', expand=False)
                   .map(s)
                  )

output:

        name compound
0  name1 ppl      150
1  PPL name2      150
2  Name3 PPL      150
3  name4 ppl      230
4  name5 ppl      230
5    name6xx  name6xx

answered Nov 03 '22 at 13:01

mozway

194,879
13
39
75

Thanks so much for the speedy solution. Why doesn’t my code work? – Joe Nov 03 '22 at 13:07
@Joe because your regex is too restricted. You would have needed: `df2['name'].str.replace('\s*ppl\s*', '', regex=True, case=False).map(s)` (with the `s` from my answer) – mozway Nov 03 '22 at 13:09

Assign values to a new column from different datafram if column list contains substring

1 Answers1