3
male[['Gender','Age']].plot(kind='hist', x='Gender', y='Age', bins=50)
female[['Gender','Age']].plot(kind='hist', x='Gender', y='Age', bins=50)

So basically, I used data from a file to create two histograms based on gender and age. From the beginning I separated the data by gender to initially plot. Now i'm having a hard time putting the two histograms together.

ksalerno
  • 177
  • 3
  • 10

2 Answers2

1

As mentioned in the comment, you can use matplotlib to do this task. I haven't figured out how to plot two histogram using Pandas tho (would like to see how people have done that).

import matplotlib.pyplot as plt
import random

# example data
age = [random.randint(20, 40) for _ in range(100)]
sex = [random.choice(['M', 'F']) for _ in range(100)]

# just give a list of age of male/female and corresponding color here
plt.hist([[a for a, s in zip(age, sex) if s=='M'], 
          [a for a, s in zip(age, sex) if s=='F']], 
         color=['b','r'], alpha=0.5, bins=10)
plt.show()
titipata
  • 5,321
  • 3
  • 35
  • 59
0

Consider converting the dataframes to a two-column numpy matrix as matplotlib's hist works with this structure instead of two different length pandas dataframes with non-numeric columns. Pandas' join is used to bind the two columns, MaleAge and FemaleAge.

Here, the Gender indicator is removed and manually labeled according to the column order.

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

...
# RESET INDEX AND RENAME COLUMN AFTER SUBSETTING
male = df2[df2['Gender'] == "M"].reset_index(drop=True).rename(columns={'Age':'MaleAge'})
female = df2[df2['Gender'] == "F"].reset_index(drop=True).rename(columns={'Age':'FemaleAge'})

# OUTER JOIN TO ACHIEVE SAME LENGTH
gendermat = np.array(male[['MaleAge']].join(female[['FemaleAge']], how='outer'))

plt.hist(gendermat, bins=50, label=['male', 'female'])
plt.legend(loc='upper right')
plt.show()
plt.clf()
plt.close()
Parfait
  • 104,375
  • 17
  • 94
  • 125