If text is contained in another dataframe then flag row with a binary designation

Question

I'm working on mining survey data. I was able to flag the rows for certain keywords:

survey['Rude'] = survey['Comment Text'].str.contains('rude', na=False, regex=True).astype(int)

Now, I want to flag any rows containing names. I have another dataframe that contains common US names. Here's what I thought would work, but it is not flagging any rows, and I have validated that names do exist in the 'Comment Text'

for row in survey:   
    for word in survey['Comment Text']:
        survey['Name'] = 0
        if word in names['Name']:
            survey['Name'] = 1

The survey df has 38,000 rows and the names df has 20,000 rows. — Justin, Mar 18 '20 at 18:51
Is comment text a string or list of words? Can you provide example input and output? — rcriii, Mar 18 '20 at 18:53
Is the `==` in `survey['Name'] == 1` just a typo in your post? — AMC, Mar 18 '20 at 18:56
Comment text examples: Example 1: Lines were long, didn't have the product I was looking for. Example 2: Very friendly staffVery clean inside and outAlways seem to have enough staff — Justin, Mar 18 '20 at 18:57
@AMC great catch! I changed it to = instead of ==. It still does not work though. — Justin, Mar 18 '20 at 19:01

rcriii · Accepted Answer · 2020-03-18T19:25:59.370

You are not looping through the series correctly. for row in survey: loops through the column names in survey. for word in survey['Comment Text']: loops though the comment strings. survey['Name'] = 0 creates a column of all 0s.

You could use set intersections and apply(), to avoid all the looping through rows:

    survey = pd.DataFrame({'Comment_Text':['Hi rcriii',
                                           'Hi yourself stranger',
                                           'say hi to Justin for me']})
    names = pd.DataFrame({'Name':['rcriii', 'Justin', 'Susan', 'murgatroyd']})
    s2 = set(names['Name'])

    def is_there_a_name(s):
        s1 = set(s.split())
        if len(s1.intersection(s2))>0:
            return 1
        else:
            return 0

    survey['Name'] = survey['Comment_Text'].apply(is_there_a_name)

    print(names)
    print(survey)

         Name
0      rcriii
1      Justin
2       Susan
3  murgatroyd
              Comment_Text  Name
0                Hi rcriii     1
1     Hi yourself stranger     0
2  say hi to Justin for me     1

As a bonus, return len(s1.intersection(s2)) to get the number of matches per line.

If text is contained in another dataframe then flag row with a binary designation

1 Answers1