Users of a Shiny app can test data sets for Poisson, normality, and exponentiality. I am returning the results of the statistical test they chose. In addition, I thought it would be nice to plot the density from the data along with the theoretical distribution. They could be testing multiple sets of data at once, so I am faceting the plot.
From ggplot add Normal Distribution while using `facet_wrap` I found the really great ggh4x package. However, since this could be industry data, there may be a minimum that is not zero.
The problem is that theodensity(distri="exp")
uses dexp which doesn't account for a minimum number, so the theoretical distribution plot doesn't match the data.
How can I tell the stat_theodensity that there is an xmin for each facet, which is the min of the data in the facet? I see that fitdistrplus
can use different methods to fit an exponential curve, and that, for example, method="mse"
would work. Is there a way to pass this through stat_theodensity
?
library(ggh4x)
#generate 2 exponential distributions with xmin > 0
data1 <- rexp(n = 500,rate = 1/100)+100
data2 <- rexp(n = 500,rate = 1/250)+500
data <- c(data1,data2)
#generate a code for facets
ID1 <- c(rep("Set 1",times=500))
ID2 <- c(rep("Set 2",times=500))
ID <- c(ID1,ID2)
#make the data for plotting
plot_dat <- data.frame(ID,data)
#make the graph
p <- ggplot(data = plot_dat, aes(x=data))+
geom_density()+
stat_theodensity(distri = "exp")+
facet_wrap(facets = ~ID,scales = "free")
p
#what the first point of the graphs should be
dexp(x = 100-100,rate = 1/100)
#[1] 0.01
dexp(x = 500-500,rate = 1/250)
#[1] 0.004
********EDIT
OK I am getting closer. The following code works, but only for the second pass through the loop. If I change the numbers around for data1 and data2, it is always only the second one that plots the theoretical distribution.
I did ggplot_build after the first loop and it gives an error in fitdist(), which is code 100. I don't know why it would always fail on the first one but not on the second one, even with the same data.
Any ideas?
#generate 2 exponential distributions with xmin > 0
data1 <- rexp(n = 500,rate = 1/250)+500
data2 <- rexp(n = 500,rate = 1/100)+250
data <- c(data1,data2)
#generate a code for facets
ID1 <- c(rep("Set 1",times=500))
ID2 <- c(rep("Set 2",times=500))
ID <- c(ID1,ID2)
#make the data for plotting
plot_dat <- data.frame(ID,data)
#make the graph
p <- ggplot(data = plot_dat, aes(x=data))+
geom_density(color="red")
#loop through sets and add facets
for (set in unique(plot_dat$ID)){
xmin <- min(plot_dat$data[ID == set])
p<-p+
stat_theodensity(
data = ~subset(.x, ID == set),
aes(x = stage(data - xmin, after_stat = x + xmin)),
distri = "exp"
)
}
#stat_theodensity(distri = "exp")+
p<-p+
facet_wrap(facets = ~ID,scales = "free")
p