0

What is the best way to determine what type of distribution data has? I am looking at daily data and I want to be able to run some scenarios and Monte Carlo. I was using np.random.normal to do this but I am not certain that these are even normal distributions.

My end goal is to determine what the most likely Monthly score (for w, y, x) is by taking this daily data and randomly generating it. The samples_w is just for the w column, and the other samples column is for y and x columns.

Date    samples_w   w   samples_y_x  y    x
7/1/2013    12432   71.93%  60  71.67%  13.33%
7/2/2013    9657    75.49%  84  69.64%  10.71%
7/3/2013    8473    75.88%  80  64.38%  20.00%
7/4/2013    4555    74.18%  71  66.20%  14.08%
7/5/2013    4276    74.16%  88  68.47%  9.09%
7/6/2013    9929    74.24%  61  61.07%  1.64%
7/7/2013    8444    74.11%  28  82.14%  -10.71%
7/8/2013    8050    72.84%  71  71.48%  32.39%
7/9/2013    9806    74.22%  84  75.00%  11.90%
7/10/2013   9686    74.79%  53  68.87%  -15.09%
7/11/2013   5665    75.20%  62  68.55%  -14.52%
7/12/2013   3801    73.82%  84  70.54%  22.62%
7/13/2013   8459    73.50%  67  63.81%  -1.49%
7/14/2013   7435    74.90%  49  66.84%  0.00%
7/15/2013   7570    74.86%  58  68.10%  -1.72%
7/16/2013   8521    76.45%  54  66.67%  0.00%
7/17/2013   8733    76.73%  71  67.25%  12.68%
7/18/2013   6386    76.93%  84  69.05%  16.67%
7/19/2013   4786    75.70%  70  74.64%  17.14%
7/20/2013   8786    76.12%  80  66.56%  11.25%
7/21/2013   7785    73.65%  50  70.00%  8.00%
7/22/2013   7806    73.19%  67  78.54%  37.31%
7/23/2013   9560    76.14%  78  72.76%  3.85%
7/24/2013   10122   75.22%  66  71.97%  4.55%
7/25/2013   5054    74.79%  93  62.63%  8.60%
7/26/2013   4086    73.18%  104 71.63%  13.46%
7/27/2013   11949   68.04%  69  74.64%  17.39%
7/28/2013   10818   69.31%  44  67.05%  9.09%
7/29/2013   10217   70.14%  56  76.79%  30.36%
7/30/2013   10108   72.96%  93  66.40%  8.60%
7/31/2013   10250   75.67%  84  67.11%  36.90%
8/1/2013    5933    74.63%  52  64.42%  23.08%
8/2/2013    4409    74.60%  93  60.22%  -1.08%
8/3/2013    8815    74.54%  69  68.48%  21.74%
8/4/2013    8875    73.61%  53  55.66%  1.89%
8/5/2013    9645    72.92%  89  60.96%  -1.12%
8/6/2013    10284   74.73%  84  66.67%  -5.95%
8/7/2013    10524   75.20%  78  66.99%  8.97%
8/8/2013    5272    75.63%  85  72.94%  24.71%
8/9/2013    3846    76.50%  102 67.77%  0.98%
8/10/2013   8682    73.21%  76  65.13%  18.42%
8/11/2013   9432    74.55%  62  72.18%  19.35%
8/12/2013   10168   75.45%  42  51.49%  2.38%
ali_m
  • 71,714
  • 23
  • 223
  • 298
trench
  • 5,075
  • 12
  • 50
  • 80
  • 5
    http://stats.stackexchange.com/ – Harvey Aug 17 '15 at 18:40
  • 2
    You could try fitting a number of prospective distributions to the data and graphically see or calculate which one is best. For fitting, the SciPy fit() method of its continuous distributions could be used. An example of this approach is given by Saullo Castro in http://stackoverflow.com/questions/6620471/fitting-empirical-distribution-to-theoretical-ones-with-scipy-python. –  Aug 17 '15 at 18:58
  • I would suggest you start out by plotting the distributions as histograms, then maybe taking a look at the [list of probability distributions on Wikipedia](https://en.wikipedia.org/wiki/List_of_probability_distributions). Since several of the columns are percentages I can tell you right away that those distributions must have finite support on the interval [0, 100]. You might also be able to take an educated guess based on how the data were generated. For example if they are counts of some random event occurring within a finite interval then they may be described by a Poisson distribution. – ali_m Aug 18 '15 at 16:33

0 Answers0