1

I have a data with more than 10000 distributions looking like the ones in red. I want to compare each one of them with a reference distribution like the one in blue. Because some are unimodal and some are multimodal I cannot use a t-test for all of them. So I am trying to detect multimodal distribution to apply a conditional test (t-test for normal distribution, mann-whithney for multimodal distribution - If any other idea please let me know). Is there any way to detect multimodal distribution?
I am also thinking about splitting the modes when I have a multimodal distribution and compare each of the mode to the reference. Is this possible? I found this SO link Calculate the modes in a multimodal distribution in R but didn't find anything more recent.

multimodal distribution

reference distribution

I tried mclust to find how many mode can be found but it doesn't work well as it will find 2 mode when the distribution looks unimodal.

library(mclust)
clust <- Mclust(data$sample_frequency)

I also tried dip.test

library(diptest)
dip.test(b$sample_frequency)

but again the p-value will not always be correct (for example the plot 77 will be significaant at p=0.001 when it will be at p=0.076 for the plot 79).

Any help/thought is welcome!

Thanks!

RCchelsie
  • 111
  • 6
  • 1
    You may want to look into [KL Divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence) – DanY Nov 16 '21 at 22:05
  • One way to quickly sort the distributions would be to compute skewness and kurtosis and use those statistics to group them. It is not clear what your overall goal/purpose is. Remember that p-values are based on a single comparison, not 10,000. – dcarlson Nov 16 '21 at 23:00
  • @dcarlson I want to compare the distribution of each to the reference, so 77 vs ref, 78 vs ref, 79 vs ref, etc. Also I don't think skewness and kurtosis will work well for multimodal distribution. – RCchelsie Nov 17 '21 at 14:01
  • @DanY Interesting, I will look into that. Thanks! – RCchelsie Nov 17 '21 at 14:23

0 Answers0