1

I have the following dataframe, df_melt:

    MatchID GameWeek        Date              Team  Home        AgainstTeam
0     46605        1  2019-08-09         Liverpool  Home       Norwich City
1     46605        1  2019-08-09      Norwich City  Away          Liverpool
2     46606        1  2019-08-10   AFC Bournemouth  Home   Sheffield United
3     46606        1  2019-08-10  Sheffield United  Away    AFC Bournemouth
4     46607        1  2019-08-10           Burnley  Home        Southampton
..      ...      ...         ...               ...   ...                ...
540   46875       28         TBC               Aston Villa  Home   
541   46875       28         TBC          Sheffield United  Away   

Clearly there is a problem, with 'TBC' values in a few rows.

How do I drop those flawed rows, or fix it otherwise?

8-Bit Borges
  • 9,643
  • 29
  • 101
  • 198
  • what are `df_pm`, `df_melt`? You should always include sample data and expected output. On the other note, the error means that you have multiple items that satisfy your condition. therefore you can use `item()` to turn those into one single number. – Quang Hoang May 07 '20 at 20:07
  • @Quang Hoang edited the question. hope it helps. – 8-Bit Borges May 07 '20 at 20:10
  • `itertuples()` almost always screams unnecessary to me. Maybe you should brief what you are trying to do, and also your **expected** output. – Quang Hoang May 07 '20 at 20:14
  • I tried to be brief, but you asked for complementary code...if I remove `itertuples()`, I get `AttributeError: 'str' object has no attribute 'MatchID'` – 8-Bit Borges May 07 '20 at 20:17
  • @QuangHoang I've found the source of error in the data. Please see above. Care to answer the best way of fixing it? – 8-Bit Borges May 07 '20 at 22:20

2 Answers2

0

You can use dateutil to test the validity of your dates.

from dateutil.parser import parse
def is_valid_date(s):
    try:
        parse(s)
        return True
    except:
        return False

df_melt = df_melt[df_melt.Date.apply(is_valid_date)]
Tuan Ta
  • 21
  • 1
  • 2
0

I assume that "TBC" means that the game is expected to happen some time in the future ("To Be Confirmed"). As a result, if you're going to use dates in analysis, I'd recommend that you filter the rows with "TBC" as the date:

df_melt_no_tbc = df_melt[df_melt.Date != "TBC"]

You can do this in a few other ways too! See this post for a few other alternatives. Here's the fully worked example with output:

>>> import pandas as pd
>>> 
>>> columns =["MatchID", "GameWeek", "Date", "Team", "Home", "AgainstTeam"]
>>> data = [["1", "1", "01-02-2020", "TeamA", "Here", "TeamB"],
...         ["1", "1", "TBC", "TeamB", "Here", "TeamA"]]
>>> df_melt = pd.DataFrame(data, columns=columns)
>>> print(df_melt)
  MatchID GameWeek        Date   Team  Home AgainstTeam
0       1        1  01-02-2020  TeamA  Here       TeamB
1       1        1         TBC  TeamB  Here       TeamA
>>> df_melt_no_tbc = df_melt[df_melt.Date != "TBC"]                                                                     
>>> print(df_melt_no_tbc)
  MatchID GameWeek        Date   Team  Home AgainstTeam
0       1        1  01-02-2020  TeamA  Here       TeamB
Jesse
  • 643
  • 6
  • 11