How to search for a word in a column with Pandas

Question

I have a pandas dataframe that has reviews in it an I want to search for a specific word in all of the columns.

df["Summary"].str.lower().str.contains("great", na=False)

This gives the outcome as true or false, but I want to create a new column with 1 or 0 written in the corresponding rows.

For example if the review has 'great' in it it should give as 1, not 2. I tried this:

if df["Summary"].str.lower().str.contains("great", na=False) == True:
    df["Great"] = '1'
else:
    df["Great"] = '0'

It gives this error: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). How can I solve this?

Try [`np.where`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html). `df["Great"] = np.where(df["Summary"].str.lower().contains("great", na=False), '1', '0')` — 0x5453, May 17 '19 at 18:49

score 2 · Accepted Answer · answered May 17 '19 at 18:57

Since True/False corresponds to 1/0, all you need is an astype conversion from bool to int:

df['Great'] = df["Summary"].str.contains("great", case=False, na=False).astype(int)

Also note I've removed the str.lower call and added case=False as an argument to str.contains for a case insensitive comparison.

Another solution would be to lowercase and then disable the regex matching for better performance.

df['Great'] = (df["Summary"].str.lower()
                            .str.contains("great", regex=False, na=False)
                            .astype(int))

Finally, you can also use a list comprehension:

df['Great'] = [1 if 'great' in s.lower() else 0 for s in df['Summary']]

If you need to handle numeric data as well, use

df['Great'] = [
    1 if isinstance(s, str) and 'great' in s.lower() else 0 
    for s in df['Summary']
]

I've detailed the advantages of list comprehensions for object data ad nauseam in this post of mine: For loops with pandas - When should I care?

score 2 · Answer 2 · answered May 17 '19 at 18:57

2

Your condition df["Summary"].str.lower().str.contains("great", na=False)

Will return a series of True or False values. It won't be equal to "True" because a series is not a python boolean. Instead you can do this to achieve what you want

df['Great'] = df['Summary'].apply(lambda x: 'great' in x.lower())

answered May 17 '19 at 18:57

NBWL

141
6

`apply` has limited use cases and should be avoided when there are better (read: vectorized/inbuilt) alternatives. You can read more [here](https://stackoverflow.com/questions/54432583/when-should-i-ever-want-to-use-pandas-apply-in-my-code). – cs95 May 17 '19 at 19:04
1

thanks for this, I'm going to start using .str accessor over apply now – NBWL May 17 '19 at 19:09
Happy coding :)) – cs95 May 17 '19 at 19:10

score 0 · Answer 3 · edited May 17 '19 at 18:54

0

A possible solution using numpy

import numpy as np
df["Great"] = np.where(df["Summary"].str.lower().contains("great", na=False), '1', '0')

Check the documentation here.

edited May 17 '19 at 18:54

Nazim Kerimbekov

4,712
8
34
58

answered May 17 '19 at 18:53

David Sidarous

1,202
1
10
25

How to search for a word in a column with Pandas

3 Answers3