-2

I need to remove rows with the same name when any of the row with the same name has missing data.

See pic for sample sample I like to remove BOTH rows for country Belize when any of the row for Belize has missing info. Here Belize is missing data for 2011, and 2012 row for Belize need to be removed too.

What's an efficient way to code this to apply to the whole dataset in Phyton?

KC_Berlin
  • 9
  • 1
  • Does this answer your question? [Apply vs transform on a group object](https://stackoverflow.com/questions/27517425/apply-vs-transform-on-a-group-object) – RichieV Sep 04 '20 at 13:03
  • Welcome KC, please take a look at that other question, you can create a column stating if any of the country's rows have `NaN`, then delete those rows – RichieV Sep 04 '20 at 13:05
  • removing the row with NA is easy with .dropna(), how do I also do that for the other row with the same? – KC_Berlin Sep 04 '20 at 13:16
  • Is there a specific issue? Please see [ask], [help/on-topic]. Also, please do not share information as images unless absolutely necessary. See: https://meta.stackoverflow.com/questions/303812/discourage-screenshots-of-code-and-or-errors, https://idownvotedbecau.se/imageofcode, https://idownvotedbecau.se/imageofanexception/. – AMC Sep 05 '20 at 00:57

2 Answers2

1

try this:

df.dropna(subset = ["Factor A"], inplace=True)

Subasri sridhar
  • 809
  • 5
  • 13
0

As mentioned in the comments, you can use transform to create a series and use it as a boolean mask to drop the desired rows.

# sample data, please always provide in this form so we can paste in our tests
# you could get it with `df.head().to_dict('list')`
df = pd.DataFrame({
    'Country': ['Afghanistan', 'Afghanistan', 'Belize', 'Belize'],
    'Factor A': [153, 141, None, 50],
    'Factor B': [3.575, 3.794, None, 5.956],
    'Year': [2011, 2012, 2011, 2012]
})

droprows = (
    df.groupby('Country') # group the rows by Country
    .transform(lambda x: x.isna().any())
        # .transform applies a function and returns the same scalar value
            # for all rows in the group
        # x.isna() returns True if a cell contains NaN, element-wise
        # .any() aggregates and returns a scalar True/False per group
        # the line returns a dataframe shaped as df.shape
            # with one les column: 'Country'
    .any(axis=1) # collapse that result into a single column
)
print(droprows)
# 0    False
# 1    False
# 2     True
# 3     True
# dtype: bool


df = df[~droprows]
print(df)

Output

       Country  Factor A  Factor B  Year
0  Afghanistan     153.0     3.575  2011
1  Afghanistan     141.0     3.794  2012
RichieV
  • 5,103
  • 2
  • 11
  • 24