0

I have a dataframe

   Unnamed: 0                       game score home_odds draw_odds away_odds country                 league             datetime
0           0  Sport Recife - Imperatriz   2:2      1.36      4.31      7.66  Brazil  Copa do Nordeste 2020  2020-02-07 00:00:00
1           1           ABC - America RN   2:1      2.62      3.30      2.48  Brazil  Copa do Nordeste 2020  2020-02-02 22:00:00
2           2  Frei Paulistano - Nautico   0:2      5.19      3.58      1.62  Brazil  Copa do Nordeste 2020  2020-02-02 00:00:00
3           3    Botafogo PB - Confianca   1:1      2.06      3.16       3.5  Brazil  Copa do Nordeste 2020  2020-02-02 22:00:00
4           4          Fortaleza - Ceara   1:1      2.19      2.98      3.38  Brazil  Copa do Nordeste 2020  2020-02-02 22:00:00

I am performing the following functions

df['game'] = df['game'].astype(str).str.replace('(\(\w+\))', '', regex=True)
df['league'] = df['league'].astype(str).str.replace('(\s\d+\S\d+)$', '', regex=True)
df['game'] = df['game'].astype(str).str.replace('(\s\d+\S\d+)$', '', regex=True)
df[['home_team', 'away_team']] = df['game'].str.split(' - ', expand=True, n=1)
df[['home_score', 'away_score']] = df['score'].str.split(':', expand=True)
df['away_score'] = df['away_score'].astype(str).str.replace('[a-zA-Z\s\D]', '', regex=True)
df['home_score'] = df['home_score'].astype(str).str.replace('[a-zA-Z\s\D]', '', regex=True)
df = df[df.home_score != "."]
df = df[df.home_score != ".."]
df = df[df.home_score != "."]
df = df[df.home_odds != "-"]
df = df[df.draw_odds != "-"]
df = df[df.away_odds != "-"]
m = df[['home_odds', 'draw_odds', 'away_odds']].astype(str).agg(lambda x: x.str.count('/'), 1).ne(0).all(1)
n = df[['home_score']].agg(lambda x: x.str.count('-'), 1).ne(0).all(1)
o = df[['away_score']].agg(lambda x: x.str.count('-'), 1).ne(0).all(1)
df = df[~m]
df = df[~n]
df = df[~o]
df = df[df.home_score != '']
df = df[df.away_score != '']
df = df.dropna()

However when I do that, I get the warning:

UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  df = df[~n]
UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  df = df[~o]

How do I resolve this?

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

leonardo
  • 140
  • 10

1 Answers1

1

I think you can try change:

n = df[['home_score']].agg(lambda x: x.str.count('-'), 1).ne(0).all(1)
o = df[['away_score']].agg(lambda x: x.str.count('-'), 1).ne(0).all(1)

to:

n = df['home_score'].str.count('-').ne(0)
o = df['away_score'].str.count('-').ne(0)

And it should be same like:

n = ~df['home_score'].str.contains('-')
o = ~df['away_score'].str.contains('-')

also should be change:

df = df[df.home_score != "."]
df = df[df.home_score != ".."]
df = df[df.home_score != "."]
df = df[df.home_odds != "-"]
df = df[df.draw_odds != "-"]
df = df[df.away_odds != "-"]

to:

df = df[~df.home_score.isin([".",".."]) | 
         df[['home_odds','draw_odds','away_odds']].ne("-").any(axis=1)]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252