After years of lurking around on stack overflow, I finally post my first question as I cannot find a post describing my problem.
For one aspect of a project I plot the distribution of a parameter (orientation) contained in a dataframe (df) to find out that it adopts a bimodal distribution. I am showing here the dataframe for the "2.7" sample:
A tibble: 5,280 x 13
ID number_of_points length bend average_curvatu~ relative_z_chan~ average_z_height orientation Depth Scale AspectR
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 14-3~ 935 940. 1.33 177. 0.291 171. 154. 0.35 6 2.7
2 14-3~ 629 629. 1.07 235. 0.346 467. 29.2 0.35 6 2.7
3 14-3~ 550 562. 1.18 159. 0.402 286. 22.5 0.35 6 2.7
4 14-3~ 334 334. 1.03 322. 0.507 444. 37 0.35 6 2.7
5 14-3~ 397 397. 1.01 292. 0.484 415. 16.4 0.35 6 2.7
6 14-3~ 1132 1135. 1.06 246. 0.301 401. 31.1 0.35 6 2.7
7 14-3~ 1169 1175. 1.14 179. 0.255 370. 11.9 0.35 6 2.7
8 14-3~ 1363 1366. 1.04 273. 0.183 383. 23.1 0.35 6 2.7
9 14-3~ 841 843. 1.09 274. 0.310 307. 21.5 0.35 6 2.7
10 14-3~ 881 883. 1.16 210. 0.226 451. 164. 0.35 6 2.7
# ... with 5,270 more rows, and 2 more variables: Circularity <dbl>, lam <chr>
Using normalmixEM I am able to determine the parameters of the 2 gaussians that model the bimodal distribution.
my_mix <- mixtools::normalmixEM(df$orientation, lambda = NULL, mu = NULL, sigma = NULL, maxit = 5000)
I'm able to extract mu, sigma and lambda parameters of the curves and store them into a table (FIT). I am showing below the parameters of both modes for the sample "2.7":
A tibble: 10 x 5
ID lambda mu sigma AspectR
<chr> <dbl> <dbl> <dbl> <dbl>
1 2.7 0.723 38.5 22.8 2.7
2 2.7 0.277 150. 20.2 2.7
I am able to generate a plot with a histogram and the 2 gaussians overlaid by running the following code:
p <- ggplot(df$orientation, aes(x = orientation)) +
geom_histogram(binwidth = 5) +
mapply(
function(mean, sd, lambda, n, binwidth){
stat_function(
fun = function(x){
(dnorm(x, mean = mean, sd = sd)) * n * binwidth * lambda
}
)
},
mean = FIT$mu,
sd = FIT$sigma,
lambda = FIT$lambda,
n = length(df$orientation),
binwidth = 5
)
})
Now what I need to do is to estimate the counts in each of the modes. My plan to do this is to compute the full width at half maximum (FWHM) of each of the modes, get the range and find out the counts in the ranges.
I tried applying what I saw here (Finding the full width half maximum of a peak) but this seems to not be for R and also in here (Determining the FWHM from a distribution in R), but in the latter it's for a unimodal distribution.
My feeling is that I may be able to apply some function in the mapply function written above that can compute the FWHM the same way it is done in Determining the FWHM from a distribution in R but all my attempts have failed.
Any suggestions?
Thank you and I hope this post is clear. Apologies if not, I'm an overflow gumby.
Will