4

My dataframe is as below

df
    time                 home_team     away_team           full_time_result                   both_teams_to_score        double_chance                         League
--  -------------------  ------------  ------------------  ---------------------------------  -------------------------  ------------------------------------  ----------------
 0  2021-01-08 19:45:00  Charlton      Accrington Stanley  {'1': 2370, 'X': 3400, '2': 3000}  {'yes': 1900, 'no': 1900}  {'1X': 1360, '12': 1300, '2X': 1530}  England League 1
 1  2021-01-09 12:30:00  Lincoln City  Peterborough        {'1': 2290, 'X': 3400, '2': 3100}  {'yes': 1800, 'no': 1950}  {'1X': 1360, '12': 1300, '2X': 1570}  England League 1
 2  2021-01-09 13:00:00  Gillingham    Burton Albion       {'1': 2200, 'X': 3400, '2': 3300}  {'yes': 1700, 'no': 2040}  {'1X': 1330, '12': 1300, '2X': 1610}  England League 1
 3  2021-01-09 17:30:00  Ipswich       Swindon             {'1': None, 'X': None, '2': None}  {'yes': 1750, 'no': 2000}  {'1X': 1220, '12': 1250, '2X': 1900}  England League 1

How can I delete row containing None? as in this example in col full_time_result I want to delete the row {'1': None, 'X': None, '2': None}

Thanks

PyNoob
  • 223
  • 1
  • 14
  • As an FYI, since your are already expanding your columns of dicts into separate rows, as per your previous [question](https://stackoverflow.com/q/65588159/7758804). The best option is to use `df_normalized = df_normalized.dropna()` after nomralizing the columns. This will be far faster than using either of the provided solutions. – Trenton McKinney Jan 08 '21 at 00:53
  • 1
    This is exactly what I did while I was waiting for your solutions but, I wanted to create a more robust code handling solution hence, I adopted the solution bu @david-erickson – PyNoob Jan 09 '21 at 00:57

2 Answers2

2

You can create a boolean mask to filter out values of full_time_result with None in '1' and '2'. Tp extract values we can use operator.itemgetter then use __eq__ to check equality i.e check if it's (None, None)

from operator import itemgetter
m = df['full_time_result'].map(itemgetter('1', '2')).map((None, None).__eq__)
df[~m]

# Alternative
# m = df['full_time_result'].map(itemgetter('1', '2')).map((None, None).__ne__)
# df[m]

Details

_.map(itemgetter('1', '2')).map((None, None).__eq__)
# All of this can be written using lambda in single line.

_.map(lambda x: itemgetter('1', '2')(x).__eq__((None, None)))

example_dict = {'1': 10, '2': 20}
itemgetter('1', '2')(example_dict)
# (10, 20)

# Since you want to identify values with `None`. We can leverage on __eq__
itemgetter('1', '2')(example_dict).__eq__((10, 20))
# True # equivalent to (10, 20) == (10, 20)

timeit results

# Benchmarking setup
s = pd.Series([{'1':10, '2':20}, {'1':None, '2':None}, {'1':1, '2':2}])
df = s.repeat(1_000_000).to_frame('full_time_result')
df.shape
# (3000000, 1) # 3 million rows, 1 column


# @david's
In [33]: %timeit df[~df['full_time_result'].apply(lambda x: any([True for v in x.values() if v == None]))]
1.59 s ± 82.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# @Ch3steR's
In [34]: %%timeit
    ...: m = df['full_time_result'].map(itemgetter('1', '2')).map((None, None).__eq__)
    ...: df[~m]
    ...:
    ...:
834 ms ± 16.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

≈ 2X faster than using lambda

Ch3steR
  • 20,090
  • 4
  • 28
  • 58
0

With lambda x: you are going through each row of the specified column. From there, you can perform normal python operations like any() and access the values() of each row's dictionary and check if any are equal to None. That will return True, so we want to filter out these True results with ~:

df[~df['full_time_result'].apply(lambda x: any([True for v in x.values() if v == None]))]
David Erickson
  • 16,433
  • 2
  • 19
  • 35