Create and fill a DataFrame column based on conditions

Question

I have a DataFrame and I need to create a new column and fill the values acording to how many words in a list of words are found in a text. I'm trying de code below:

df = pd.DataFrame({'item': ['a1', 'a2', 'a3'], 
               'text': ['water, rainbow', 'blue, red, white','country,school,magic']})


list_of_words = ['water', 'pasta', 'black', 'magic', 'glasses', 'school' ,'book']

for index,row in df.iterrows():
    text = row['text']
        count_found_words = 0
        for word in list_of_words:
            found_words= re.findall(word, text)
            if len(found_words)>0:
                count_found_words += 1
        df['found_words'] = count_found_words

This code actually create a new column, but fill all the rows with the last 'count_found_words' of the loop.

is there a right way to do this?

Does your solution work? What needs to be improved to make it *better*? — wwii, May 11 '21 at 18:13
doesnt work, it fills all the columns with the same value, witch comes from the last loop — Lucas Ferreira de Oliveira, May 11 '21 at 18:15
Your [mre] should always include a **minimal** example of the data. The *data* can be made up as long as it illustrates the issue. [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — wwii, May 11 '21 at 18:15
Change `df['found_words'] = count_found_words` to `df.loc[index,'found_words'] = count_found_words` — wwii, May 11 '21 at 18:20

score 2 · Accepted Answer · answered May 11 '21 at 18:20

pattern = fr"\b({'|'.join(list_of_words)})\b"

df["found_words"] = df.text.str.findall(pattern).str.len()

score 1 · Answer 2 · answered May 11 '21 at 18:25

1

Or you can TRY:

df['found_words'] = df.text.str.split(',').apply(
    lambda x: sum(i in list_of_words for i in x))

answered May 11 '21 at 18:25

Nk03

14,699
2
8
22

score -1 · Answer 3 · answered May 11 '21 at 18:18

-1

You can define a function count_words that returns count_found_words and use df['found_words'] = df['text'].map(count_words)

answered May 11 '21 at 18:18

Mathieu

13
4

Create and fill a DataFrame column based on conditions

3 Answers3