2

I am creating a Naive Bayes classifier in Python that will be able to guess which month it is based on some weather data of a single day.

Currently the mean and standard deviation are used to classify the month, however I figured that adding skewness and kurtosis might help in improving the accuracy.

I am currently using scipy.stats.norm.cdf to calculate the chance, but I cannot seem to find any cdf function in Python that takes skewness and kurtosis into account.

I feel like I might not be understanding skewness and kurtosis correctly. Skewness and kurtosis have an impact on the cdf function and therefore I expected them to be given as a parameter.

Is there something fundamentally wrong with my understanding of skewness, kurtosis and the cdf function? If not, then where can I find an implementation of the cdf function in Python that takes all these parameters into account?

Thijs Riezebeek
  • 1,762
  • 1
  • 15
  • 22
  • It might not solve your problem, but take a look at: http://scikit-learn.org/stable/modules/naive_bayes.html – Dietrich Nov 27 '15 at 22:05
  • In a normal distribution skewness and kurtosis are both zero and therefore you will have to use a different kind of distribution if you want to somehow define it from these parameters. – Leandro Caniglia Nov 28 '15 at 12:28

1 Answers1

2

Normal distribution, which you use (scipy.stats.norm) and which is typicaly used to model one-dimensional conditional distribution in Naive Bayes is explicitly defined by just two parameters - its mean and std. There is no point in specifing skewness/kurtosis as they are constant for your distribution (in particular kurtosis is 3).

What you are thinking about is probably a Pearson distribution, which is used to fit more moments (mean, std, skewness and kurtosis).

http://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.stats.pearson3.html

lejlot
  • 64,777
  • 8
  • 131
  • 164