Suppose I have the following data:
df = pd.DataFrame({
'state':['CA', 'WA', 'CA', 'AZ','AZ','WA','WA','CA'],
'gender':['M','F','M','F', 'F','F','M','M']
})
What I am trying to do is create a dataframe via aggregation that shows the 'M'and 'F' breakdown by state. Right now I am able to get the frequency of each category pair which turns out to be:
df.groupby(['state', 'gender']).size()
state gender
AZ F 2
CA M 3
WA F 2
M 1
My first question with this problem is how can I get the previous table to also show the fact that CA has 0 Fs and AZ has 0 Ms. Ideally I would like to have something that appears like the following:
state gender
AZ F 2
M 0
CA M 3
F 0
WA F 2
M 1
Finally, what I am attempting to do with this is create percentage breakdowns based on frequency and the full total in the state. That would look like the following:
state gender
AZ F 100.00
M 0.00
CA M 100.00
F 0.00
WA F 66.66
M 33.33
This comes close to what I need but it doesn't handle frequency and instances where a certain category is zero. Can anyone help?