Pandas: How many times does a string appear in a dataframe cell?

Question

I believe to have a simple problem. I have a pandas dataframe df looking quite similar to this:

data = [{"Text" : "Dog", "Dog" : 1},
        {"Text" : "Cat", "Dog" : 0}, 
        {"Text" : "Mouse", "Dog" : 0}, 
        {"Text" : "Dog", "Dog" : 1}]

df = pd.DataFrame(data)

I am trying to search the column Text for a number of keywords and count how many times they appear in each cell. The result is supposed to be stored in a new column that shows how many times the specific keyword was found. The result is supposed to be just like the Dog column.

I tried using pandas str.count. It works just fine. But in the moment I try to store the result in a new column, I run in to trouble:

mykewords = ('Cat', 'Mouse')
df['Cat'] = df.Text.str.count("Cat")

I get the following error message:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':

I have two questions:

What am I doing wrong and how can I solve it?
How can loop through all keywords in mykeywords and get a column each?

Thank you very much for any help in advance!

There are only one keyword in `Text` column? Or possiblle like `data = [{"Text" : "Dog Cat", "Dog" : 1}, {"Text" : "Cat Cat", "Dog" : 0}, {"Text" : "Mouse Cat", "Dog" : 0}, {"Text" : "Dog", "Dog" : 1}]` ? — jezrael, Mar 04 '19 at 07:11

score 2 · Answer 1 · answered Mar 04 '19 at 07:18

2

Just update the pandas with the lastest version and try below code. It's works like a charm for me.

import pandas as pd
data = [{"Text" : "Dog", "Dog" : 1},
        {"Text" : "Cat", "Dog" : 0}, 
        {"Text" : "Mouse", "Dog" : 0}, 
        {"Text" : "Dog", "Dog" : 1}]

df = pd.DataFrame(data)
mykewords = ['Cat', 'Mouse']
for i in mykewords:
    df[i] = df.Text.str.count(i)

answered Mar 04 '19 at 07:18

Parul Garg

102
1
10

1

Same solution was posted 9 minutes ago, plase check my answer. – jezrael Mar 04 '19 at 07:18
2

Right your solution is almost similar to me but here am using list instead of tuple. @jezrael. – Parul Garg Mar 04 '19 at 07:20

jezrael · Accepted Answer · 2019-03-04T07:28:57.163

0

If possible multiple values in text and need count values:

mykewords = ('Cat', 'Mouse')
for x in mykewords:
    df[x] = df.Text.str.count(x)

Better solution is use words boundaries with Series.str.findall and Series.str.len:

for x in mykewords:
    df[x] = df.Text.str.findall(r"\b{}\b".format(x)).str.len()

Difference in solutions:

data = [{"Text" : "Dog Cat Catman", "Dog" : 1},
        {"Text" : "Cat Cat", "Dog" : 0}, 
        {"Text" : "Mouse Cat", "Dog" : 0}, 
        {"Text" : "Dog", "Dog" : 1}]

df = pd.DataFrame(data)
df1 = df.copy()
print (df)
   Dog            Text
0    1  Dog Cat Catman
1    0         Cat Cat
2    0       Mouse Cat
3    1             Dog

mykewords = ('Cat', 'Mouse')

for x in mykewords:
    df[x] = df.Text.str.findall(r"\b{}\b".format(x)).str.len()
print (df)
   Dog            Text  Cat  Mouse
0    1  Dog Cat Catman    1      0 <-not match Catman
1    0         Cat Cat    2      0
2    0       Mouse Cat    1      1
3    1             Dog    0      0

for x in mykewords:
    df1[x] = df1.Text.str.count(x)
print (df1)
   Dog            Text  Cat  Mouse
0    1  Dog Cat Catman    2      0 <-match Catman
1    0         Cat Cat    2      0
2    0       Mouse Cat    1      1
3    1             Dog    0      0

edited Mar 04 '19 at 07:28

answered Mar 04 '19 at 07:08

jezrael

822,522
95
1,334
1,252

Hi @jezrael, thanks for the solution. I still get the error from above though. Any idea why? – Rachel Mar 04 '19 at 07:11
sorry, I don't quite get it. With both solutions, I get the same error. How do I apply `copy()` – Rachel Mar 04 '19 at 07:16
1

@Rachel - Is possible seen your code before error, 3 lines? – jezrael Mar 04 '19 at 07:17
I get it! I tried to apply it to a slice/copy of my dataframe - namely `df.head(5)`. That didn't work! Thank you! – Rachel Mar 04 '19 at 07:20
1

@Rachel - also you can check [this](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas/53954986#53954986) for better explanation of `SettingWithCopyWarning` – jezrael Mar 04 '19 at 07:23

Pandas: How many times does a string appear in a dataframe cell?

2 Answers2