I am trying to make a histogram using ggplot, where over 95% of the data is 0 and the rest of it is between 1 - 55. I do not want to show the 0s on the histogram - but I do want them accounted for in the total percentage, that way the other %s remain low. I've taken two approaches for this -- but what happens is the percentages for the rest of the data get messed up and the 0s aren't included in the calculation.
My first approach was this:
set1 %>% filter(total>0)%>%
ggplot(aes(x=total, fill=lowcost))+
geom_histogram(binwidth=1,aes(y = (..count..)/sum(..count..)),col=I("black"))+
scale_color_grey()+scale_fill_grey(start = .85,
end = .85,) +
theme_linedraw()+
guides(fill = "none", cols='none')+
geom_vline(aes(xintercept=10, size='Low target'),
color="black", linetype=5)+
geom_vline(aes(xintercept=50, size='High target'),
color="black", linetype="dotted")+
scale_size_manual(values = c(.5, 0.5), guide=guide_legend(title = "Target", override.aes = list(linetype=c(3,5), color=c('black', 'black'))))+
scale_y_continuous(labels=scales::percent)+
scale_x_continuous(breaks = c(seq(0,50,10), 55), labels = c(seq(0, 50, 10), '>55'), limits = c(0, 60))+
facet_grid(cols = vars(lowcost))+
ggtitle("Ask Set 1 ")+
theme(plot.title = element_text(hjust = 0.5))+
xlab("Total donation ($)")+
ylab("Percent")
My second approach was not filtering out the 0s, but instead limiting the X axis to not include them, but this didn't work either:
set1 %>%
ggplot(aes(x=total, fill=lowcost))+
geom_histogram(binwidth=1,aes(y = (..count..)/sum(..count..)),col=I("black"))+
scale_color_grey()+scale_fill_grey(start = .85,
end = .85,) +
theme_linedraw()+
guides(fill = "none", cols='none')+
geom_vline(aes(xintercept=10, size='Low target'),
color="black", linetype=5)+
geom_vline(aes(xintercept=50, size='High target'),
color="black", linetype="dotted")+
scale_size_manual(values = c(.5, 0.5), guide=guide_legend(title = "Target", override.aes = list(linetype=c(3,5), color=c('black', 'black'))))+
scale_y_continuous(labels=scales::percent)+
scale_x_continuous(breaks = c(seq(0,50,10), 55), labels = c(seq(0, 50, 10), '>55'), limits = c(0.01, 60))+
facet_grid(cols = vars(lowcost))+
ggtitle("Ask Set 1 ")+
theme(plot.title = element_text(hjust = 0.5))+
xlab("Total donation ($)")+
ylab("Percent")
Both result in histograms like look like this: The tallest bar on the left histogram should actually be 1.19%
The percents should be the following in the histogram on the left:
The percents should be the following in the histogram on the right: