I have a data frame like this:
> head(a)
FID IID FLASER PLASER DIABDUR HBA1C ESRD pheno
1 fam1000-03 G1000 1 1 38 10.2 1 control
2 fam1001-03 G1001 1 1 15 7.3 1 control
3 fam1003-03 G1003 1 2 17 7.0 1 case
4 fam1005-03 G1005 1 1 36 7.7 1 control
5 fam1009-03 G1009 1 1 23 7.6 1 control
6 fam1052-03 G1052 1 1 32 7.3 1 control
My df has 1698 obs of which 828 who have "case" in pheno
column and 836 who have "control" in pheno
column.
I make a histogram via:
library(ggplot2)
ggplot(a, aes(x=HBA1C, fill=pheno)) +
geom_histogram(binwidth=.5, position="dodge")
I would like to have the y-axis show the percentage of individuals which have either "case" or "control" in pheno instead of the count. So percentage would be calculated for each group on y axis ("case" or "control"). I also do have NAs in my plot and it would be good to exclude those from the plot.
I guess I can remove NAs from pheno with this:
ggplot(data=subset(a, !is.na(pheno)), aes(x=HBA1C, fill=pheno)) + geom_histogram(binwidth=.5, position="dodge")