1

I'm trying to fit a negative binomial distribution to counts data but scaled back to counts like in this example In my data, I have to separate out the binomial function plotting for two species. However, there is not easy way to specify this within the function and getting the line legends with parameter values in the key for both species.

set.seed(111)
count <- rbinom(500,100,0.1) 
species <- rep(c("A","B"),time = 250)
df <- data.frame(count,species)

#Specifying negative binomial function
negbinom.params <- fitdistr(df$count,"negative binomial", method = "SANN")$estimate
dist.params <- map(list(`Negative Binomial` = negbinom.params),~ map2(names(.),.,~ paste0(.x," = ",round(.y,2))) %>% unlist %>% paste0(.,collapse = ", ")) %>% map2_chr(names(.),., ~ paste(.x,.y,sep=":\n"))

#Plotting
mybinwidth = 2
ggplot(df, aes(x = count, colour = species, fill = species)) + 
  facet_grid(.~species) +  
  geom_histogram(aes(y=..count..),alpha = 0.5, binwidth = mybinwidth) + 
  stat_function(aes(color = "orange"), 
                fun = function(x,size, mu) {
                    mybinwidth * nrow(df) * dnbinom(x,size = size, mu = mu)
                },
                args=fitdistr(df$count, "negative binomial", method="SANN")$estimate, 
                xlim=c(0,50),n=20)

enter image description here

teunbrand
  • 33,645
  • 4
  • 37
  • 63
Rspacer
  • 2,369
  • 1
  • 14
  • 40

1 Answers1

2

You are right, this is a bit of a pain to get right. I've adapted your example a little bit to show two different distribution more clearly. Here is my attempt to make your approach work:

library(ggplot2)
library(MASS)
#> Warning: package 'MASS' was built under R version 3.6.2

set.seed(111)

df <- data.frame(
  count = rnbinom(500, rep(c(5, 10), each  = 250), 0.5),
  species = rep(c("A", 'B'), each = 250)
)

# Not the prettiest formatting, but it'll show the point
ests <- sapply(split(df$count, df$species), function(x) {
  est <- fitdistr(x, "negative binomial", method = "SANN")$estimate
  formatted <- paste0(names(est)[1], " = ", format(est, digits = 2)[1], ",",
                      names(est)[2], " = ", format(est, digits = 2)[2])
  formatted
})

mybinwidth <- 1

spec_A = df[df$species == "A",]
spec_B = df[df$species == "B",]

ggplot(df, aes(count)) +
  geom_histogram(binwidth = mybinwidth,
                 aes(fill = species), alpha = 0.5,
                 position = "identity") +
  stat_function(aes(color = "A"), 
                data = data.frame(species = "A"),
                fun = function(x, size, mu) {
                  mybinwidth * nrow(spec_A) * dnbinom(x,size = size, mu = mu)
                },
                args = fitdistr(spec_A$count, "negative binomial", method="SANN")$estimate, 
                xlim = c(0, max(df$count)), n= max(df$count) + 1, inherit.aes = FALSE) +
  stat_function(aes(color = "B"), 
                data = data.frame(species = "B"),
                fun = function(x, size, mu) {
                  mybinwidth * nrow(spec_B) * dnbinom(x,size = size, mu = mu)
                },
                args = fitdistr(spec_B$count, "negative binomial", method="SANN")$estimate, 
                xlim = c(0, max(df$count)), n= max(df$count) + 1, inherit.aes = FALSE) +
  scale_colour_discrete(labels = unname(ests), name = "fit") +
  facet_wrap(~ species)
#> Warning: `mapping` is not used by stat_function()
#> Warning: `data` is not used by stat_function()
#> Warning: `mapping` is not used by stat_function()
#> Warning: `data` is not used by stat_function()

Created on 2020-04-15 by the reprex package (v0.3.0)

There are also packages that do the majority of this work for you. Disclaimer: I wrote ggh4x, so I'm not unbiased. You can also replace the ggplot code with the following (assuming similar preprocessing)

library(ggh4x)
ggplot(df, aes(count)) +
  geom_histogram(binwidth = mybinwidth,
                 aes(fill = species), alpha = 0.5,
                 position = "identity") +
  stat_theodensity(aes(colour = species,
                       y = after_stat(count * mybinwidth)),
                   distri = "nbinom") +
  scale_colour_discrete(labels = unname(ests), name = "fit") +
  facet_wrap(~ species)

Hope that helped!

teunbrand
  • 33,645
  • 4
  • 37
  • 63
  • This is amazing! I'm still finding it a little hard to implement the alternative second method. RN, it looks like "ggh4x" is not a package in R. So I used the installation code that you had on. But it wont load from github and syas "Error: Failed to install 'ggh4x' from GitHub:(converted from warning) installation of package " – Rspacer Apr 15 '20 at 16:27
  • You're right that it's not on CRAN, I don't think it has enough body yet to put on CRAN. I got that error too on my local machine and for me it got fixed by reinstalling the Rcpp package. – teunbrand Apr 15 '20 at 16:44