3

I have some count data. I want to plot histogram with the count data and add the negative binomial, normal, and Poisson density function but fit the functions to the count data.

I tried following this example but (a) I have trouble fitting the negative binomial and poisson functions (b) No where close to scaling it to the count data level (c) Dont know how to fit all three on same graph with legends for each line (d) Also, how can I get basic stats of each fit? for example, the neg binomial fit will generate a parameter k. How can I get that to appear on the plot

set.seed(111)
counts <- rbinom(500,100,0.1) 
df <- data.frame(counts)

ggplot(df, aes(x = counts)) + 
  geom_histogram(aes(y=..density..),colour = "black", fill = "white") +
  stat_function(fun=dnorm,args=fitdistr(df$counts,"normal")$estimate)

ggplot(df, aes(x = counts)) + 
  geom_histogram(aes(y=..density..),colour = "black", fill = "white") +
  stat_function(fun=poisson,args=fitdistr(df$counts,"poisson")$estimate)

ggplot(df, aes(x = counts)) + 
  geom_histogram(aes(y=..density..),colour = "black", fill = "white") +
  stat_function(fun=dnbinom,args=fitdistr(df$counts,"dnbinom")$estimate)
Rspacer
  • 2,369
  • 1
  • 14
  • 40

2 Answers2

5

You have a few issues, first "dnbinom" is not a valid distribution for MASS::fitdistr. Secondly, MASS::fitdistr wasn't able to get a fit with the default method, so we can use method = "SANN". Thirdly, stat_function tries to evaluate dnbinom at non-integer values unless you tell it otherwise, which doesn't work.

Getting the parameters to show in the legend is a little tricky, because you'll have to estimate them outside of the ggplot call. I was lazy and used purrr::map2, but you could use some base R functions to do the same thing.

library(purrr)
library(dplyr)
norm.params <- fitdistr(df$counts,"normal")$estimate
poisson.params <- fitdistr(df$counts,"poisson")$estimate
negbinom.params <- fitdistr(df$counts,"negative binomial", method = "SANN")$estimate

dist.params <- map(list(Normal = norm.params,Poisson = poisson.params,`Negative Binomial` = negbinom.params),
    ~ map2(names(.),.,~ paste0(.x," = ",round(.y,2))) %>% unlist %>% paste0(.,collapse = ", ")) %>% 
    map2_chr(names(.),., ~ paste(.x,.y,sep=":\n"))

Finally, if we want to scale by counts, as found in this answer, we just define anonymous functions.

mybinwidth = 1
ggplot(df, aes(x = counts)) + 
  geom_histogram(aes(y=..count..),colour = "black", fill = "white", binwidth = mybinwidth) +
  stat_function(aes(color = "black"),fun=function(x,mean,sd) mybinwidth * nrow(df) * dnorm(x,mean, sd),
                args=fitdistr(df$counts,"normal")$estimate) +
  stat_function(aes(color = "blue"),fun=function(x,lambda) mybinwidth * nrow(df) * dpois(x,lambda), 
                args=fitdistr(df$counts,"poisson")$estimate,
                xlim=c(1,20), n=20) + 
  stat_function(aes(color = "orange"),fun=function(x,size, mu) mybinwidth * nrow(df) * dnbinom(x,size = size, mu = mu),
                args=fitdistr(df$counts,"negative binomial", method="SANN")$estimate,
                xlim=c(1,20),n=20) + 
  scale_color_manual("Distribution", values=c(black="black",blue="blue",orange="orange"),
                     labels=dist.params)

enter image description here

Ian Campbell
  • 23,484
  • 14
  • 36
  • 57
  • Thank you! This is quite helpful. Would it be be possible to scale the density to original frequency like in the example post? I'm still struggling with that. Also, for the negative binom distribution I get an error saying there is not method called "SAMM"..shows me a "SANN" option. The resultant plot params are different. Any idea why? – Rspacer Apr 13 '20 at 13:27
  • "SAMM" was a typo on my part. I also added the scaling function. – Ian Campbell Apr 13 '20 at 14:11
0

A solution for the Poisson density.

library(MASS)
library(ggplot2)
ggplot(df, aes(x = counts)) + 
  geom_histogram(aes(y=..density..),colour = "black", fill = "white", bins=20) +
  stat_function(fun=dpois, args=fitdistr(df$counts,"poisson")$estimate, 
                xlim=c(0,19), n=20, size=1) +
  theme_bw()

enter image description here

And here is a solution for the negative binomial.

ggplot(df, aes(x = counts)) + 
  geom_histogram(aes(y=..density..),colour = "black", fill = "white", bins=20) +
  stat_function(fun=dnbinom, 
                args=fitdistr(df$counts,"negative binomial", method="SANN")$estimate, 
                xlim=c(0,19), n=20, size=1) +
  theme_bw()

enter image description here

Marco Sandri
  • 23,289
  • 7
  • 54
  • 58