How to highlight and count specific keywords in a pandas dataframe

Question

For each row in the Text column of my df, I want to do the following:

Highlight the keywords gross,suck,singing & ponzi
Count the number of keywords in each row and store them in a Count column

import pandas as pd

data = {'Text': ['The bread tastes good','Tuna is gross','Teddy is a beach bum','Angela suck at singing!','oneCoin was a ponzi scheme'],
        'ID': [1001,1002,1003,1004,1005]
        }

df = pd.DataFrame(data, columns = ['ID', 'Text'])

print(df)

The desired output should include the Count column and look like this :

My attempt (not the best! you can ignore this):

# keyword list
key_words = ['gross','suck','singing','ponzi']

# highlight the keywords
df['Text'].applymap(lambda x: "background-color: yellow" if x else "")

# count the keywords present in each row

df['Count'] = df['Text'].str.count(r"\b(?:{})\b".format("|".join(key_words)))

All attempts highly appreciated!

`df['Count'] = df['Text'].str.count(r"\b(?:{})\b".format("|".join(key_words)))`? — Wiktor Stribiżew, May 28 '21 at 22:31
@WiktorStribiżew- Thanks, that part works fine! what about flagging the `key_words`? — RayX500, May 28 '21 at 22:37
Where do you need to highlight them? In a Linux terminal? In Jupyter notebook? — Wiktor Stribiżew, May 28 '21 at 22:42
It [looks like it is impossible](https://stackoverflow.com/questions/49961211/python-pandas-highlight-matching-text-and-row). — Wiktor Stribiżew, May 28 '21 at 22:46
@WiktorStribiżew, Okay, what about we create another column with the `keywords_Present` in each row (see the revised figure- `keyword_Present`), is this possible? — RayX500, May 28 '21 at 23:02

score 1 · Answer 1 · answered May 28 '21 at 22:34

1

Use str, find all. That will give you a list. count elements in each list using str.len()

df['count']=df['Text'].str.findall('|'.join(key_words)).str.len()
df

answered May 28 '21 at 22:34

wwnde

26,119
6
18
32

Thanks, any luck with the highlighting part of the question? – RayX500 May 28 '21 at 22:38

score 1 · Accepted Answer · answered May 28 '21 at 22:48

1

Use Series.str.count:

>>> df['Text'].str.count(fr"\b(?:{'|'.join(key_words)})\b")
0    0
1    1
2    0
3    2
4    1
Name: Text, dtype: int64

\b is a word boundary, you can get whole word count with it.

You can't highlight separate words in Jupyter notebook. You can extract the words into a separate column:

df['Matches'] = df['Text'].str.findall(fr"\b(?:{'|'.join(key_words)})\b")

answered May 28 '21 at 22:48

Ryszard Czech

18,032
4
24
37

1

thanks, what about the highlighting part? or can we create a new column `Keyword_Present` like in the figure above. – RayX500 May 28 '21 at 23:05
@RickyTricky Sorry, no highlighting. `df['Keyword_Present'] = df['Text'].str.findall(fr"\b(?:{'|'.join(key_words)})\b").str.join(' ')` can be used instead. – Ryszard Czech May 29 '21 at 00:26

How to highlight and count specific keywords in a pandas dataframe

2 Answers2