Update column values based on other columns

Question

I have a weak grasp of Pandas and not a strong understanding of Python.

I am wanting to update a column (d.Alias) based on the value of existing columns (d.Company and d2.Alias). d.Alias should be equal to d2.Alias if d2.Alias is a substring of d.Company.

Example datasets:

d = {'Company': ['The Cool Company Inc', 'Cool Company, Inc', 'The Cool 
        Company', 'The Shoe Company', 'Muffler Store', 'Muffler Store'],
    'Position': ['Cool Job A', 'Cool Job B', 'Cool Job C', 'Salesman', 
        'Sales', 'Technician'],
    'City': ['Tacoma', 'Tacoma','Tacoma', 'Boulder', 'Chicago', 'Chicago'],
    'State': ['AZ', 'AZ', 'AZ', 'CO', 'IL', 'IL'],
    'Alias': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]}
d2 = {'Company': ['The Cool Company, Inc.', 'The Shoe Company', 'Muffler 
         Store LLC'],
    'Alias': ['Cool Company', np.nan, 'Muffler'],
    'First Name': ['Carol', 'James', 'Frankie'],
    'Last Name': ['Fisher', 'Smith', 'Johnson']}

The np.nan for The Shoe Company is because for that instance an alias is not necessary.

I have tried using .loc, for loops, while loops, pandas.where, numpy.where, and several variations of each with no desirable outcomes. When using a for loop, the end of d2.Alias was copied to all rows in d.Alias. I have not been able to reproduce that, however.

Previous posts that I have looked at which I wasn't able to get to work, or I didn't understand them: Conditionally fill column with value from another DataFrame based on row match in Pandas pandas create new column based on values from other columns

Any help is greatly appreciated!

EDIT:

Expected output

Update:
After a few days of tinkering I reached the desired outcome. With Wen's response I had to change a couple of things.

First, I created a list from df2.Alias called aliases:
aliases = df2.Alias.unique()

Then, I had to remove .map(df2.set_index('Company').Alias. The line that generated my desired resutls:
df1['Alias'] = df1.Company.apply(lambda x: [process.extract(x, aliases, limit=1)][0][0][0]).

Can you us the expected output? It is unclear to me, what you meant by "if `d2.Alias` is contained in `d.Company`" — harvpan, May 18 '18 at 20:26
Added an outcome and changed "if d2.Alias is contained in d.Company" @HarvIpan — Chris, May 18 '18 at 20:40

ALollz · Answer 1 · 2018-05-18T20:46:03.553

One approach is to loop through your presumably much smaller dataframe and just look to see when the alias is a substring of d.Company and then just replace the alias with that.

import pandas as pd
d = pd.DataFrame(d)
d2 = pd.DataFrame(d2)

for row in d2[d2.Alias.notnull()].itertuples():
    d.loc[d.Company.str.contains(row.Alias), 'Alias'] = row.Alias

print(d)
#          Alias     City               Company    Position State
#0  Cool Company   Tacoma  The Cool Company Inc  Cool Job A    AZ
#1  Cool Company   Tacoma     Cool Company, Inc  Cool Job B    AZ
#2  Cool Company   Tacoma      The Cool Company  Cool Job C    AZ
#3           NaN  Boulder      The Shoe Company    Salesman    CO
#4       Muffler  Chicago         Muffler Store       Sales    IL
#5       Muffler  Chicago         Muffler Store  Technician    IL

I recommend `df.itertuples` instead of `df.iterrows` for performance; otherwise this looks good +1. — jpp, May 18 '18 at 20:44
Thank you for the help. When I tried this, the rows for 'The Cool Company' didn't have anything in `'Alias'`. The other rows seemed to work just fine though. — Chris, May 24 '18 at 12:40

score 2 · Accepted Answer · answered May 18 '18 at 20:43

Solution from fuzzywuzzy

from fuzzywuzzy import process

df1['Alias']=df1.Company.apply(lambda x :[process.extract(x, df2.Company, limit=1)][0][0][0]).map(df2.set_index('Company').Alias)
df1
Out[31]: 
          Alias     City               Company    Position State
0  Cool Company   Tacoma  The Cool Company Inc  Cool Job A    AZ
1  Cool Company   Tacoma     Cool Company, Inc  Cool Job B    AZ
2  Cool Company   Tacoma      The Cool Company  Cool Job C    AZ
3           NaN  Boulder      The Shoe Company    Salesman    CO
4       Muffler  Chicago         Muffler Store       Sales    IL
5       Muffler  Chicago         Muffler Store  Technician    IL

Thanks For introducing me to `fuzzywuzzy` +1 . – harvpan May 18 '18 at 20:45 — harvpan, May 18 '18 at 20:45

Update column values based on other columns

2 Answers2