How to combine two histograms python

Question

male[['Gender','Age']].plot(kind='hist', x='Gender', y='Age', bins=50)
female[['Gender','Age']].plot(kind='hist', x='Gender', y='Age', bins=50)

So basically, I used data from a file to create two histograms based on gender and age. From the beginning I separated the data by gender to initially plot. Now i'm having a hard time putting the two histograms together.

Possible duplicate of [Plot two histograms at the same time with matplotlib](http://stackoverflow.com/questions/6871201/plot-two-histograms-at-the-same-time-with-matplotlib) — splinter, Apr 16 '17 at 18:01
I looked at this before, but when I do something similar to plt.hist(male, label='x') plt.hist(female, label='y') It gives me TypeError: len() of unsized object — ksalerno, Apr 16 '17 at 18:03
Then show a code that produces this error and ask about this error.. — splinter, Apr 16 '17 at 18:06

score 1 · Answer 1 · answered Apr 16 '17 at 21:10

As mentioned in the comment, you can use matplotlib to do this task. I haven't figured out how to plot two histogram using Pandas tho (would like to see how people have done that).

import matplotlib.pyplot as plt
import random

# example data
age = [random.randint(20, 40) for _ in range(100)]
sex = [random.choice(['M', 'F']) for _ in range(100)]

# just give a list of age of male/female and corresponding color here
plt.hist([[a for a, s in zip(age, sex) if s=='M'], 
          [a for a, s in zip(age, sex) if s=='F']], 
         color=['b','r'], alpha=0.5, bins=10)
plt.show()

score 0 · Answer 2 · answered Apr 16 '17 at 21:10

Consider converting the dataframes to a two-column numpy matrix as matplotlib's hist works with this structure instead of two different length pandas dataframes with non-numeric columns. Pandas' join is used to bind the two columns, MaleAge and FemaleAge.

Here, the Gender indicator is removed and manually labeled according to the column order.

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

...
# RESET INDEX AND RENAME COLUMN AFTER SUBSETTING
male = df2[df2['Gender'] == "M"].reset_index(drop=True).rename(columns={'Age':'MaleAge'})
female = df2[df2['Gender'] == "F"].reset_index(drop=True).rename(columns={'Age':'FemaleAge'})

# OUTER JOIN TO ACHIEVE SAME LENGTH
gendermat = np.array(male[['MaleAge']].join(female[['FemaleAge']], how='outer'))

plt.hist(gendermat, bins=50, label=['male', 'female'])
plt.legend(loc='upper right')
plt.show()
plt.clf()
plt.close()

How to combine two histograms python

2 Answers2