1

For each row in the Text column of my df, I want to do the following:

  1. Highlight the keywords gross,suck,singing & ponzi

  2. Count the number of keywords in each row and store them in a Count column

import pandas as pd

data = {'Text': ['The bread tastes good','Tuna is gross','Teddy is a beach bum','Angela suck at singing!','oneCoin was a ponzi scheme'],
        'ID': [1001,1002,1003,1004,1005]
        }

df = pd.DataFrame(data, columns = ['ID', 'Text'])

print(df)


The desired output should include the Count column and look like this :

enter image description here

My attempt (not the best! you can ignore this):

# keyword list
key_words = ['gross','suck','singing','ponzi']

# highlight the keywords
df['Text'].applymap(lambda x: "background-color: yellow" if x else "")

# count the keywords present in each row

df['Count'] = df['Text'].str.count(r"\b(?:{})\b".format("|".join(key_words)))


All attempts highly appreciated!

RayX500
  • 247
  • 2
  • 10

2 Answers2

1

Use str, find all. That will give you a list. count elements in each list using str.len()

df['count']=df['Text'].str.findall('|'.join(key_words)).str.len()
df
wwnde
  • 26,119
  • 6
  • 18
  • 32
1

Use Series.str.count:

>>> df['Text'].str.count(fr"\b(?:{'|'.join(key_words)})\b")
0    0
1    1
2    0
3    2
4    1
Name: Text, dtype: int64

\b is a word boundary, you can get whole word count with it.

You can't highlight separate words in Jupyter notebook. You can extract the words into a separate column:

df['Matches'] = df['Text'].str.findall(fr"\b(?:{'|'.join(key_words)})\b")
Ryszard Czech
  • 18,032
  • 4
  • 24
  • 37
  • 1
    thanks, what about the highlighting part? or can we create a new column `Keyword_Present` like in the figure above. – RayX500 May 28 '21 at 23:05
  • @RickyTricky Sorry, no highlighting. `df['Keyword_Present'] = df['Text'].str.findall(fr"\b(?:{'|'.join(key_words)})\b").str.join(' ')` can be used instead. – Ryszard Czech May 29 '21 at 00:26