I've got this pandas DataFrame, which is a description of plays during a football match:
play_id | type | Text |
---|---|---|
1 | pass | Jon pass complete to Ben. |
2 | pass | Clock 14:52, Jon pass complete to Mitch. |
3 | rush | Frank rush. |
My objective is to create a new column called "passer" with a script that will go through the description in the "text' column, and will take the name that is placed before the word 'pass'. So I first used this:
df['passer'] = df['Text'].str.extract(r'(.*?)pass', expand=False).str.strip()
Which gives me this:
play_id | type | Text | passer |
---|---|---|---|
1 | pass | Jon pass complete to Ben. | Jon |
2 | pass | Clock 14:52, Jon pass complete to Mitch. | Clock 14:52, Jon |
3 | rush | Frank rush. | NaN |
It works correctly for the 1st and 3rd playid, but not for the second one, as it takes the clock, that can sometimes be included in the description.
I have tried to implement conditions on the creation of my column, where the code checks if 'Clock' is included in the description or not, and use the correct regex, but this does not work:
conditions = [
(np.where(df.Text.str.contains('Clock', case=False))),
(np.where(~df.Text.str.contains('Clock', case=False)))
]
choices = [
df['Text'].str.extract(r', (.*?) pass', expand=False).str.strip(),
df['Text'].str.extract('(.*?) pass', expand=False).str.strip()
]
df['passerNEW'] = np.select(conditions, choices, default='NaN')
df
I get the following error:
TypeError: invalid entry 0 in condlist: should be boolean ndarray
Is there a way to make this function work? That seemed like a good way to do it, as in other cases I can have three different conditions to check in order to know which regex to use.