1

I believe to have a simple problem. I have a pandas dataframe df looking quite similar to this:

data = [{"Text" : "Dog", "Dog" : 1},
        {"Text" : "Cat", "Dog" : 0}, 
        {"Text" : "Mouse", "Dog" : 0}, 
        {"Text" : "Dog", "Dog" : 1}]

df = pd.DataFrame(data)

I am trying to search the column Text for a number of keywords and count how many times they appear in each cell. The result is supposed to be stored in a new column that shows how many times the specific keyword was found. The result is supposed to be just like the Dog column.

I tried using pandas str.count. It works just fine. But in the moment I try to store the result in a new column, I run in to trouble:

mykewords = ('Cat', 'Mouse')
df['Cat'] = df.Text.str.count("Cat")

I get the following error message:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':

I have two questions:

  1. What am I doing wrong and how can I solve it?
  2. How can loop through all keywords in mykeywords and get a column each?

Thank you very much for any help in advance!

Rachel
  • 1,937
  • 7
  • 31
  • 58

2 Answers2

2

Just update the pandas with the lastest version and try below code. It's works like a charm for me.

import pandas as pd
data = [{"Text" : "Dog", "Dog" : 1},
        {"Text" : "Cat", "Dog" : 0}, 
        {"Text" : "Mouse", "Dog" : 0}, 
        {"Text" : "Dog", "Dog" : 1}]

df = pd.DataFrame(data)
mykewords = ['Cat', 'Mouse']
for i in mykewords:
    df[i] = df.Text.str.count(i)
Parul Garg
  • 102
  • 1
  • 10
0

If possible multiple values in text and need count values:

mykewords = ('Cat', 'Mouse')
for x in mykewords:
    df[x] = df.Text.str.count(x)

Better solution is use words boundaries with Series.str.findall and Series.str.len:

for x in mykewords:
    df[x] = df.Text.str.findall(r"\b{}\b".format(x)).str.len()

Difference in solutions:

data = [{"Text" : "Dog Cat Catman", "Dog" : 1},
        {"Text" : "Cat Cat", "Dog" : 0}, 
        {"Text" : "Mouse Cat", "Dog" : 0}, 
        {"Text" : "Dog", "Dog" : 1}]

df = pd.DataFrame(data)
df1 = df.copy()
print (df)
   Dog            Text
0    1  Dog Cat Catman
1    0         Cat Cat
2    0       Mouse Cat
3    1             Dog

mykewords = ('Cat', 'Mouse')

for x in mykewords:
    df[x] = df.Text.str.findall(r"\b{}\b".format(x)).str.len()
print (df)
   Dog            Text  Cat  Mouse
0    1  Dog Cat Catman    1      0 <-not match Catman
1    0         Cat Cat    2      0
2    0       Mouse Cat    1      1
3    1             Dog    0      0

for x in mykewords:
    df1[x] = df1.Text.str.count(x)
print (df1)
   Dog            Text  Cat  Mouse
0    1  Dog Cat Catman    2      0 <-match Catman
1    0         Cat Cat    2      0
2    0       Mouse Cat    1      1
3    1             Dog    0      0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Hi @jezrael, thanks for the solution. I still get the error from above though. Any idea why? – Rachel Mar 04 '19 at 07:11
  • sorry, I don't quite get it. With both solutions, I get the same error. How do I apply `copy()` – Rachel Mar 04 '19 at 07:16
  • 1
    @Rachel - Is possible seen your code before error, 3 lines? – jezrael Mar 04 '19 at 07:17
  • I get it! I tried to apply it to a slice/copy of my dataframe - namely `df.head(5)`. That didn't work! Thank you! – Rachel Mar 04 '19 at 07:20
  • 1
    @Rachel - also you can check [this](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas/53954986#53954986) for better explanation of `SettingWithCopyWarning` – jezrael Mar 04 '19 at 07:23