How can I delete row containing None from pandas dict?

Question

My dataframe is as below

df
    time                 home_team     away_team           full_time_result                   both_teams_to_score        double_chance                         League
--  -------------------  ------------  ------------------  ---------------------------------  -------------------------  ------------------------------------  ----------------
 0  2021-01-08 19:45:00  Charlton      Accrington Stanley  {'1': 2370, 'X': 3400, '2': 3000}  {'yes': 1900, 'no': 1900}  {'1X': 1360, '12': 1300, '2X': 1530}  England League 1
 1  2021-01-09 12:30:00  Lincoln City  Peterborough        {'1': 2290, 'X': 3400, '2': 3100}  {'yes': 1800, 'no': 1950}  {'1X': 1360, '12': 1300, '2X': 1570}  England League 1
 2  2021-01-09 13:00:00  Gillingham    Burton Albion       {'1': 2200, 'X': 3400, '2': 3300}  {'yes': 1700, 'no': 2040}  {'1X': 1330, '12': 1300, '2X': 1610}  England League 1
 3  2021-01-09 17:30:00  Ipswich       Swindon             {'1': None, 'X': None, '2': None}  {'yes': 1750, 'no': 2000}  {'1X': 1220, '12': 1250, '2X': 1900}  England League 1

How can I delete row containing None? as in this example in col full_time_result I want to delete the row {'1': None, 'X': None, '2': None}

Thanks

As an FYI, since your are already expanding your columns of dicts into separate rows, as per your previous [question](https://stackoverflow.com/q/65588159/7758804). The best option is to use `df_normalized = df_normalized.dropna()` after nomralizing the columns. This will be far faster than using either of the provided solutions. — Trenton McKinney, Jan 08 '21 at 00:53
This is exactly what I did while I was waiting for your solutions but, I wanted to create a more robust code handling solution hence, I adopted the solution bu @david-erickson — PyNoob, Jan 09 '21 at 00:57

Ch3steR · Answer 1 · 2021-01-07T04:57:41.410

You can create a boolean mask to filter out values of full_time_result with None in '1' and '2'. Tp extract values we can use operator.itemgetter then use __eq__ to check equality i.e check if it's (None, None)

from operator import itemgetter
m = df['full_time_result'].map(itemgetter('1', '2')).map((None, None).__eq__)
df[~m]

# Alternative
# m = df['full_time_result'].map(itemgetter('1', '2')).map((None, None).__ne__)
# df[m]

Details

_.map(itemgetter('1', '2')).map((None, None).__eq__)
# All of this can be written using lambda in single line.

_.map(lambda x: itemgetter('1', '2')(x).__eq__((None, None)))

example_dict = {'1': 10, '2': 20}
itemgetter('1', '2')(example_dict)
# (10, 20)

# Since you want to identify values with `None`. We can leverage on __eq__
itemgetter('1', '2')(example_dict).__eq__((10, 20))
# True # equivalent to (10, 20) == (10, 20)

timeit results

# Benchmarking setup
s = pd.Series([{'1':10, '2':20}, {'1':None, '2':None}, {'1':1, '2':2}])
df = s.repeat(1_000_000).to_frame('full_time_result')
df.shape
# (3000000, 1) # 3 million rows, 1 column


# @david's
In [33]: %timeit df[~df['full_time_result'].apply(lambda x: any([True for v in x.values() if v == None]))]
1.59 s ± 82.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# @Ch3steR's
In [34]: %%timeit
    ...: m = df['full_time_result'].map(itemgetter('1', '2')).map((None, None).__eq__)
    ...: df[~m]
    ...:
    ...:
834 ms ± 16.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

≈ 2X faster than using lambda

when `print(df[m])` I get this row value `{'1': None, 'X': None, '2': None}` but am unable to delete it. — PyNoob, Jan 07 '21 at 04:17

score 0 · Accepted Answer · answered Jan 07 '21 at 04:22

With lambda x: you are going through each row of the specified column. From there, you can perform normal python operations like any() and access the values() of each row's dictionary and check if any are equal to None. That will return True, so we want to filter out these True results with ~:

df[~df['full_time_result'].apply(lambda x: any([True for v in x.values() if v == None]))]

How can I delete row containing None from pandas dict?

2 Answers2

Details

timeit results