Deleting rows with the same name when any of the row (with same name) has missing data

Question

I need to remove rows with the same name when any of the row with the same name has missing data.

See pic for sample sample I like to remove BOTH rows for country Belize when any of the row for Belize has missing info. Here Belize is missing data for 2011, and 2012 row for Belize need to be removed too.

What's an efficient way to code this to apply to the whole dataset in Phyton?

Does this answer your question? [Apply vs transform on a group object](https://stackoverflow.com/questions/27517425/apply-vs-transform-on-a-group-object) — RichieV, Sep 04 '20 at 13:03
Welcome KC, please take a look at that other question, you can create a column stating if any of the country's rows have `NaN`, then delete those rows — RichieV, Sep 04 '20 at 13:05
removing the row with NA is easy with .dropna(), how do I also do that for the other row with the same? — KC_Berlin, Sep 04 '20 at 13:16
Is there a specific issue? Please see [ask], [help/on-topic]. Also, please do not share information as images unless absolutely necessary. See: https://meta.stackoverflow.com/questions/303812/discourage-screenshots-of-code-and-or-errors, https://idownvotedbecau.se/imageofcode, https://idownvotedbecau.se/imageofanexception/. — AMC, Sep 05 '20 at 00:57

score 1 · Answer 1 · answered Sep 04 '20 at 13:11

1

try this:

df.dropna(subset = ["Factor A"], inplace=True)

answered Sep 04 '20 at 13:11

Subasri sridhar

809
5
13

score 0 · Answer 2 · answered Sep 04 '20 at 14:19

As mentioned in the comments, you can use transform to create a series and use it as a boolean mask to drop the desired rows.

# sample data, please always provide in this form so we can paste in our tests
# you could get it with `df.head().to_dict('list')`
df = pd.DataFrame({
    'Country': ['Afghanistan', 'Afghanistan', 'Belize', 'Belize'],
    'Factor A': [153, 141, None, 50],
    'Factor B': [3.575, 3.794, None, 5.956],
    'Year': [2011, 2012, 2011, 2012]
})

droprows = (
    df.groupby('Country') # group the rows by Country
    .transform(lambda x: x.isna().any())
        # .transform applies a function and returns the same scalar value
            # for all rows in the group
        # x.isna() returns True if a cell contains NaN, element-wise
        # .any() aggregates and returns a scalar True/False per group
        # the line returns a dataframe shaped as df.shape
            # with one les column: 'Country'
    .any(axis=1) # collapse that result into a single column
)
print(droprows)
# 0    False
# 1    False
# 2     True
# 3     True
# dtype: bool


df = df[~droprows]
print(df)

Output

       Country  Factor A  Factor B  Year
0  Afghanistan     153.0     3.575  2011
1  Afghanistan     141.0     3.794  2012

Deleting rows with the same name when any of the row (with same name) has missing data

2 Answers2