Plotting missing values per category as a proportion of total values in each category in matplotlib

Question

I have 2 columns in dataset, one is Country other is Gender , Gender has some NaN or N/A values. In pandas it look like so:

import pandas as pd    
Country = ['United Kingdom', 'Bosnia and Herzegovina', 'Thailand', 'United States', 'Ukraine', 'Canada', 'Ukraine', 'India', 'New Zealand', 'India', 'Antigua and Barbuda', 'Canada', 'United States', 'Germany', 'India', 'United Kingdom', 'Australia', 'Russian Federation', 'Brazil', 'Lithuania']
Gender = ['Man', 'Man', 'Man', 'Man', 'Man', 'Man', 'Man', 'Man', 'Man', nan, 'Man', 'Woman', 'Man', 'Man', 'Man', 'Man', 'Man', 'Man', 'Man', 'Man']

I want to plot missing values as a proportion of all values of Gender in a Country category. I first use groupby on Country column which gave me only total count of gender within a country, what will be the best way to plot missing values as proportion of total gender value in a country:

EDIT: Answered this question in comments, please view solution, if you think the solution could be better please comment under the solution comment.

Don't put images of your data. Check out [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). — Diziet Asahi, Apr 30 '20 at 20:32
@DizietAsahi I have improved my question and also found and wrote solution for it, will you please remove -1 mark from my question? — Sidrah Madiha Siddiqui, Mar 14 '21 at 19:15

Sidrah Madiha Siddiqui · Answer 1 · 2020-05-01T17:23:09.463

Hi here is I created a solution for it,

first I created a similar dataframe that has all values in Gender = NaN,
then I replaced NaN with 1s
I grouped this dataframe by Country and summed Gender column count
then on the original data frame I again filled NaNs with 1s (This could be done in first step)
then grouped this dataframe by Country and summed Gender column
I joined these 2 dataframes on axis 1
I created a column that takes dataframe 1 Gender count (count of Gender = NaN per country), multilplies by 100 and divides by the second data frame Gender count
Then I plotted this new column as a bar plot

    Country = ['United Kingdom', 'Bosnia and Herzegovina', 'Thailand', 'United States', 'Ukraine', 'Canada', 'Ukraine', 'India', 'New Zealand', 'India', 'Antigua and Barbuda', 'Canada', 'United States', 'Germany', 'India', 'United Kingdom', 'Australia', 'Russian Federation', 'Brazil', 'Lithuania']
    Gender = ['Man', 'Man', 'Man', 'Man', 'Man', 'Man', 'Man', 'Man', 'Man', nan, 'Man', 'Woman', 'Man', 'Man', 'Man', 'Man', 'Man', 'Man', 'Man', 'Man']
    survey = pd.DataFrame({ 'Country' :Country , 'Gender':Gender})
    null_gender["Gender"].fillna(1, inplace = True) # filling 1 instead of NaN for the count in next line of code
    freq = null_gender.groupby('Country')['Gender'].count().rename('NullCount') # Number of NaN in Gender Column per country

    survey["Gender"].fillna(1, inplace = True)
    freq_full =survey.groupby('Country')['Gender'].count().rename('Totalcount')
    total_freq = pd.concat([freq, freq_full], axis=1)
    total_freq = total_freq.dropna(how='any',axis=0) 
    total_freq['null_percent'] = total_freq.apply(lambda row: round((row.NullCount*100) /row.Totalcount, 2) , axis = 1) 
    total_freq.sort_values('null_percent', ascending=True, inplace=True)
    total_freq['null_percent'].plot(kind='barh', figsize=(10,25))

Plotting missing values per category as a proportion of total values in each category in matplotlib

1 Answers1