1

enter image description here

I would like to find the proportion of missing values of my features on each country and on all years to select the countries.

I tried this:

df[indicators].isna().mean().sort_values(ascending=False)

but it gives me the proportion of missing values for each indicator only...

i would like this output :

enter image description here

Giordano
  • 37
  • 6

1 Answers1

0

You can use DataFrame.melt for reshape and then aggregate mean of missing values:

df1 = (df.melt(id_vars='Country Name', value_vars=indicators)
         .set_index('Country Name')['value'].isna()
         .groupby('Country Name')
         .mean()
         .reset_index(name='Prop'))

Or reshape by DataFrame.stack:

df1 = (df.set_index('Country Name')[indicators]
         .stack(dropna=False)
         .isna()
         .groupby('Country Name')
         .mean()
         .reset_index(name='Prop')
        )

Or use custom function:

df1 = (df.groupby('Country Name')[indicators]
         .apply(lambda x: np.mean(x.isna().to_numpy(), axis=None))
         .reset_index(name='Prop'))
                   
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252