0

I would need to change label if at least one of data frame's columns contain one of the following words:

check_words=['pit','stop','PIT','STOP','Pit','Stop']

A sample of rows in my dataframe is:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([['Ferrari was hit by a radio communication blackout' , 'Scuderia Ferrari trying a double pit stop', ' If Ferrari takes nothing else away from the 2019 season, it must learn from its mistakes across the season'], ['We may use the following original news sources for stories', 'Sebastian Vettel insisted he trusts in Ferrari', 'During the recent Grand Prix of Italy, the Scuderia Ferrari team managed to execute one of the fastest pit stops ever performed during a Formula 1 race']]),
                   columns=['Text1', 'Short','Data'])

I created a column Label as follows:

df['Label']='No Pit' to identify if a column contains or does not contain the words in the list above. If it contains a word within that list, then I would need to change label in 'Pit'.

Could you please tell me how I can change it?

3 Answers3

0

you can do this by make a dataframe of list andd then add it to your main dataframe here is code.

import pandas as pd
import numpy as np

check_words=['pit','stop','PIT','STOP','Pit','Stop']

df = pd.DataFrame(np.array([['Ferrari was hit by a radio communication blackout' , 'Scuderia Ferrari trying a double pit stop', ' If Ferrari takes nothing else away from the 2019 season, it must learn from its mistakes across the season'], ['We may use the following original news sources for stories', 'Sebastian Vettel insisted he trusts in Ferrari', 'During the recent Grand Prix of Italy, the Scuderia Ferrari team managed to execute one of the fastest pit stops ever performed during a Formula 1 race']]),columns=['Text1', 'Short','Data'])

df['Label'] = pd.DataFrame(check_words)

print(df)

Output

0   Ferrari was hit by a radio communication blackout   Scuderia Ferrari trying a double pit stop   If Ferrari takes nothing else away from the 2...    pit
1   We may use the following original news sources...   Sebastian Vettel insisted he trusts in Ferrari  During the recent Grand Prix of Italy, the Scu...   stop
0

Try this:

l = pd.DataFrame(np.vectorize(lambda r: any(x in r for x in check_words))(df.iloc[:3].values))
df['Label'] = l.any(1).agg(lambda x: 'pit' if x else 'not pit')

Hope this helps !!!

luckyCasualGuy
  • 641
  • 1
  • 5
  • 15
0

I use my own function (chk_word) to check and write 'check_words'.

def chk_word(row):
    for c in check_words:
        if row.str.contains(c).any():
            return c
df['Label'] = df.apply(chk_word, axis=1)
df
Text1   Short   Data    Label
0   Ferrari was hit by a radio communication blackout   Scuderia Ferrari trying a double pit stop   If Ferrari takes nothing else away from the 2...    pit
1   We may use the following original news sources...   Sebastian Vettel insisted he trusts in Ferrari  During the recent Grand Prix of Italy, the Scu...   pit
r-beginners
  • 31,170
  • 3
  • 14
  • 32