8

I have two histograms.

int Hist1[10] = {1,4,3,5,2,5,4,6,3,2};

int Hist1[10] = {1,4,3,15,12,15,4,6,3,2};

Hist1's distribution is of type multi-modal;

Hist2's distribution is of type uni-modal with single prominent peak.

My questions are

  1. Is there any way that i could determine the type of distribution programmatically?
  2. How to quantify whether these two histograms are similar/dissimilar?

Thanks

Adrian McCarthy
  • 45,555
  • 16
  • 123
  • 175
Raj
  • 1,113
  • 1
  • 17
  • 34
  • You may find this question helpful http://stackoverflow.com/questions/2661402/r-given-a-set-of-random-numbers-drawn-from-a-continuous-univariate-distribution . The answers, however, refer to the R programming environment. – gd047 May 27 '10 at 17:21

5 Answers5

1

These are just guesses, but I would try fitting each distribution as a gaussian distribution and use something like the R-squared value to determine if the distribution is uni-modal or not.

As to the similarity between the two distributions, I would try doing an autocorrelation and using the peak positive value in the autocorrelation as a similarity measure. These ideas are pretty rough, but hopefully they give you some ideas.

Justin Peel
  • 46,722
  • 6
  • 58
  • 80
1

For #2, you could calculate their cross-correlation (so long as the buckets themselves can be sorted). That would give you a rough estimation of what "similarity".

Frank Krueger
  • 69,552
  • 46
  • 163
  • 208
1

Raj,

I posted a C function in your other question ( automatically compare two series -Dissimilarity test ) that will compute divergence between two sets of similar data. It's actually intended to tell you how closely real data matches predicted data but I suspect you could use it for your purpose.

Basically, the smaller the error, the more similar the two sets are.

Community
  • 1
  • 1
oosterwal
  • 1,479
  • 8
  • 16
0

Comparison of Histograms (For Use in Cloud Modeling).

(That's an MS .doc file.)

sigfpe
  • 7,996
  • 2
  • 27
  • 48
0

There are a variety of software packages that will "fit" your distributions to known discrete distributions for you - Minitab, STATA, R, etc. A reference to fitting distributions in R is here. I wouldn't advise programming this from scratch.

Regarding distribution comparisons, if neither distribution fits a known distribution (Poisson, Binomial, etc.), then you need to use non-parametric methods described here.

Grembo
  • 1,223
  • 7
  • 6