11

I have a csv file which contains Gender and Marriage status along with few more columns like below.

Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
LP001002,Male,No,0,Graduate,No,5849,0,,360,1,Urban,Y
LP001003,Male,Yes,1,Graduate,No,4583,1508,128,360,1,Rural,N
LP001005,Male,Yes,0,Graduate,Yes,3000,0,66,360,1,Urban,Y
LP001006,Male,Yes,0,Not Graduate,No,2583,2358,120,360,1,Urban,Y
LP001008,Male,No,0,Graduate,No,6000,0,141,360,1,Urban,Y
LP001011,Male,Yes,2,Graduate,Yes,5417,4196,267,360,1,Urban,Y

I want to count no. of married Males and Females and show the same in graph as shown below

Below is the code I am using :

import csv
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

if __name__ == '__main__':
    x=[]
    y=[]
    df = pd.read_csv(
        "/home/train.csv",usecols=[1,2]).dropna(subset=['Gender','Married'])  # Reading the dataset in a dataframe using Pandas
    groups = df.groupby(['Gender','Married'])['Married'].apply(lambda x: x.count())
    print(groups)

After group by I have following result :

Gender  Married
Female  No          80
        Yes         31
Male    No         130
        Yes        357

I want the following chart

enter image description here

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
pythonaddict
  • 283
  • 3
  • 4
  • 13

1 Answers1

26

You can use groupby + size and then use Series.plot.bar:

Difference between count and size.

groups = df.groupby(['Gender','Married']).size()
groups.plot.bar()

graph

Another solution is add unstack for reshape or crosstab:

print (df.groupby(['Gender','Married']).size().unstack(fill_value=0))
Married   No  Yes
Gender           
Female    80   31
Male     130  357

df.groupby(['Gender','Married']).size().unstack(fill_value=0).plot.bar()

Or:

pd.crosstab(df['Gender'],df['Married']).plot.bar()

graph

Graham
  • 7,431
  • 18
  • 59
  • 84
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Do you use spyder/anaconda ? – jezrael Jul 25 '17 at 09:51
  • But I think you can use `import matplotlib.pyplot as plt` first, then `df.groupby(['Gender','Married']).size().unstack(fill_value=0‌​).plot.bar()` and last `plt.show()` – jezrael Jul 25 '17 at 09:54
  • could you please explain what is happening in the above solution? df.groupby(['Gender','Married']).size().unstack(fill_value=0) this line. What will unstack do here? Thanks in advance – pythonaddict Jul 25 '17 at 10:05
  • `unstack` function get same lebel of multiindex (`'column' Married`) and pivot table - create column names and reorder data by it. It is called pivoting. – jezrael Jul 25 '17 at 10:11