1

I've got files with irradiance data measured every minute 24 hours a day. So if there is a day without any clouds on the sky the data shows a nice continuous bell curves. When looking for a day without any clouds in the data I always plotted month after month with gnuplot and checked for nice bell curves.

I was wondering If there's a python way to check, if the Irradiance measurements form a continuos bell curve. Don't know if the question is too vague but I'm simply looking for some ideas on that quest :-)

Peter S
  • 625
  • 1
  • 9
  • 32
  • 1
    http://stackoverflow.com/questions/11507028/fit-a-gaussian-function#11507723 and figure out how "good" the match is? – Jasper Mar 09 '16 at 20:12
  • For me the question is not clear. If you get your data and take the mean and std dev, you will have the continue curve. Do you want to see how good your data fits on that. Is that? – wesley.mesquita Mar 09 '16 at 20:13

2 Answers2

2

For a normal distribution, there are normality tests.

In short, we abuse some knowledge we have of what normal distributions look like to identify them.

  • The kurtosis of any normal distribution is 3. Compute the kurtosis of your data and it should be close to 3.

  • The skewness of a normal distribution is zero, so your data should have a skewness close to zero

  • More generally, you could compute a reference distribution and use a Bregman Divergence, to assess the difference (divergence) between the distributions. bin your data, create a histogram, and start with Jensen-Shannon divergence.

With the divergence approach, you can compare to an arbitrary distribution. You might record a thousand sunny days and check if the divergence between the sunny day and your measured day is below some threshold.

Jacob Panikulam
  • 1,196
  • 1
  • 9
  • 12
0

Just to complement the given answer with a code example: one can use a Kolmogorov-Smirnov test to obtain a measure for the "distance" between two distributions. SciPy offers a neat interface for this, called kstest:

from scipy import stats
import numpy as np

data = np.random.normal(size=100)  # Our (synthetic) dataset
D, p = stats.kstest(data, "norm")  # Perform a one-sided Kolmogorov-Smirnov test

In the above example, D denotes the distance between our data and a Gaussian normal (norm) distribution (smaller is better), and p denotes the corresponding p-value. Other distributions can be similarly tested by substituting norm with those implemented in scipy.stats.

MPA
  • 1,878
  • 2
  • 26
  • 51