5

I have an array of data values as follows :

0.000000000000000000e+00
3.617000000000000171e+01
1.426779999999999973e+02
2.526699999999999946e+01
4.483190000000000168e+02
7.413999999999999702e+00
1.132390000000000043e+02
8.797000000000000597e+00
1.362599999999999945e+01
2.080880900000000111e+04
5.580000000000000071e+00
3.947999999999999954e+00
2.615000000000000213e+00
2.458000000000000185e+00
8.204600000000000648e+01
1.641999999999999904e+00
5.108999999999999986e+00
2.388999999999999790e+00
2.105999999999999872e+00
5.783000000000000362e+00
4.309999999999999609e+00
3.685999999999999943e+00
6.339999999999999858e+00
2.198999999999999844e+00
3.568999999999999950e+00
2.883999999999999897e+00
7.307999999999999829e+00
2.515000000000000124e+00
3.810000000000000053e+00
2.829000000000000181e+00
2.593999999999999861e+00
3.963999999999999968e+00
7.258000000000000007e+00
3.543000000000000149e+00
2.874000000000000110e+00
................... and so on. 

I want to plot Probability Density function of the data values. I referred (Wiki) and scipy.stats.gaussian_kde. but i am not getting that is correct or not. i am using python. simple data plot code is as follows :

from matplotlib import pyplot as plt
plt.plot(Data)

But now i want to plot PDF (Probability Density Function). But i am not getting any library in python to do so.

Jongware
  • 22,200
  • 8
  • 54
  • 100
KrunalParmar
  • 1,062
  • 2
  • 18
  • 31
  • Since you are working with *discrete* data, your PDF will be categorised into 'bins'. Creating these bins is difficult with doubles, because it is very hard to state equality on them, therefore your PDF as it currently stands will almost certainly look like a flat line (as it is counting N unique values). You need to introduce some way of comparing these like rounding etc. – Scott Stainton May 22 '16 at 11:03
  • Ok. i can round it off up to 2 decimal points. then how can i plot ? @ScottStainton – KrunalParmar May 22 '16 at 11:07
  • 1
    After rounding, you would need to count the occurrence of each number, then divide that by the total amount of data you have, this gives you the probability for each value. Plotting this value is your PDF. – Scott Stainton May 22 '16 at 11:12

2 Answers2

11

The dataset you provide is very small to allow for a reliable kernel-density estimation. Therefore, I will demostrate the procedure (if I understood correctly what you are trying to do) by using another data set

import numpy as np
import scipy.stats

# generate data samples
data = scipy.stats.expon.rvs(loc=0, scale=1, size=1000, random_state=123)

A kernel density estimation can then be obtained by simply calling

scipy.stats.gaussian_kde(data,bw_method=bw)

where bw is an (optional) parameter for the estimation procedure. For this data set, and considering three values for bw the fit is as shown below

# test values for the bw_method option ('None' is the default value)
bw_values =  [None, 0.1, 0.01]

# generate a list of kde estimators for each bw
kde = [scipy.stats.gaussian_kde(data,bw_method=bw) for bw in bw_values]


# plot (normalized) histogram of the data
import matplotlib.pyplot as plt 
plt.hist(data, 50, normed=1, facecolor='green', alpha=0.5);

# plot density estimates
t_range = np.linspace(-2,8,200)
for i, bw in enumerate(bw_values):
    plt.plot(t_range,kde[i](t_range),lw=2, label='bw = '+str(bw))
plt.xlim(-1,6)
plt.legend(loc='best')

enter image description here

Note that large bw values result in a smoother pdf estimate, however, with the cost (in this example) of suggesting negative values are possible, which is not the case here.

Stelios
  • 5,271
  • 1
  • 18
  • 32
  • I found a ; in the python code ```plt.hist(data, 50, normed=1, facecolor='green', alpha=0.5);``` – Y00 Nov 24 '21 at 10:22
  • also, according to the document "normed : bool, optional Deprecated; use the density keyword argument instead. " – Y00 Nov 24 '21 at 10:25
7

Use numpy.histogram

Example:

# a is your data array
hist, bins = np.histogram(a, bins=100, normed=True)
bin_centers = (bins[1:]+bins[:-1])*0.5
plt.plot(bin_centers, hist)
Han-Kwang Nienhuys
  • 3,084
  • 2
  • 12
  • 31