pandas - If partial string match exists, put value in new column

Question

I've got a tricky problem in pandas to solve. I was previously referred to this thread as a solution but it is not what I am looking for.

Take this example dataframe with two columns:

df = pd.DataFrame([['Mexico', 'Chile'], ['Nicaragua', 'Nica'], ['Colombia', 'Mex']], columns = ["col1", "col2"])

I first want to check each row in column 2 to see if that value exists in column 1. This is checking full and partial strings.

df['compare'] = df['col2'].apply(lambda x: 'Yes' if df['col1'].str.contains(x).any() else 'No')

I can check to see that I have a match of a partial or full string, which is good but not quite what I need. Here is what the dataframe looks like now:

What I really want is the value from column 1 which the value in column 2 matched with. I have not been able to figure out how to associate them

My desired result looks like this:

What should happen in a value in `col2` is a partial match to more than one value in `col1`? — jfaccioni, Feb 25 '21 at 13:49
Use `df['compare'] = np.where(df['compare'] == 'Yes', df['col1'], None)` — jezrael, Feb 25 '21 at 13:50
@jezrael this is not a duplicate (at least not for the thread you linked earlier) — Erich Purpur, Feb 25 '21 at 13:55
@jfaccioni In both my example here and my real dataset that would not happen. But if it did, any matched value from `col1` would be ok — Erich Purpur, Feb 25 '21 at 13:59

score 2 · Accepted Answer · answered Feb 25 '21 at 14:11

Here's a "pandas-less" way to do it. Probably not very efficient but it gets the job done:

def compare_cols(match_col, partial_col):
    series = []
    for partial_str in partial_col:
        for match_str in match_col:
            if partial_str in match_str:
                series.append(match_str)
                break  # matches to the first value found in match_col
        else:  # for loop did not break = no match found
                series.append(None)
    return series

df = pd.DataFrame([['Mexico', 'Chile'], ['Nicaragua', 'Nica'], ['Colombia', 'Mex']], columns = ["col1", "col2"])

df['compare'] = compare_cols(match_col=df.col1, partial_col=df.col2)

Note that if a string in col2 matches to more than one string in col1, the first occurrence is used.

pandas - If partial string match exists, put value in new column

1 Answers1