17

I have the following code below that prints the PDF graph for a particular mean and standard deviation.

https://i.stack.imgur.com/5WRKO.jpg

Now I need to find the actual probability, of a particular value. So for example if my mean is 0, and my value is 0, my probability is 1. This is usually done by calculating the area under the curve. Similar to this:

http://homepage.divms.uiowa.edu/~mbognar/applets/normal.html

I am not sure how to approach this problem

import numpy as np
import matplotlib    
import matplotlib.pyplot as plt

def normal(power, mean, std, val):
    a = 1/(np.sqrt(2*np.pi)*std)
    diff = np.abs(np.power(val-mean, power))
    b = np.exp(-(diff)/(2*std*std))
    return a*b

pdf_array = []
array = np.arange(-2,2,0.1)
print array
for i in array:
    print i
    pdf = normal(2, 0, 0.1, i)
    print pdf
    pdf_array.append(pdf)

plt.plot(array, pdf_array)
plt.ylabel('some numbers')
plt.axis([-2, 2, 0, 5])
plt.show()

print 
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Raaj
  • 363
  • 1
  • 3
  • 12

3 Answers3

16

Unless you have a reason to implement this yourself. All these functions are available in scipy.stats.norm

I think you asking for the cdf, then use this code:

from scipy.stats import norm
print(norm.cdf(x, mean, std))
martinako
  • 2,690
  • 1
  • 25
  • 44
  • I am using C++. Unless you are telling me there is an eigen function – Raaj Feb 09 '17 at 06:25
  • OP: "This is usually done by calculating the area under the curve." That would be the CDF, not the PDF as this answer suggests. – sspaniel Mar 15 '23 at 00:00
8

If you want to write it from scratch:

class PDF():
    def __init__(self,mu=0, sigma=1):
        self.mean = mu
        self.stdev = sigma
        self.data = []

    def calculate_mean(self):
        self.mean = sum(self.data) // len(self.data)
        return self.mean

    def calculate_stdev(self,sample=True):
        if sample:
            n = len(self.data)-1
        else:
            n = len(self.data)
        mean = self.mean
        sigma = 0
        for el in self.data:
            sigma += (el - mean)**2
        sigma = math.sqrt(sigma / n)
        self.stdev = sigma
        return self.stdev

    def pdf(self, x):
        return (1.0 / (self.stdev * math.sqrt(2*math.pi))) * math.exp(-0.5*((x - self.mean) / self.stdev) ** 2)



Ana
  • 135
  • 1
  • 6
  • 1
    Actually, density at a point is **0**, you need the area of an interval to get the pdf. What you can do is take an _epsilon_ value, suppose **1e-6**, and integrate over the area between **x & x+e** to calculate the pdf at a point. It won't be accurate, but close. [integral-calculator](https://www.integral-calculator.com/) -> this site could help you reformulate your pdf equation. – Vaibhav Singh May 25 '20 at 18:37
  • you can add this equation on the site: _1/(sqrt(2*pi*sigma^2)) * exp(-(x-mu)^2/(2*sigma^2))_ and then put in the limits say 2 and 2.0000001, it will give you the equations. – Vaibhav Singh May 25 '20 at 18:51
4

The area under a curve y = f(x) from x = a to x = b is the same as the integral of f(x)dx from x = a to x = b. Scipy has a quick easy way to do integrals. And just so you understand, the probability of finding a single point in that area cannot be one because the idea is that the total area under the curve is one (unless MAYBE it's a delta function). So you should get 0 ≤ probability of value < 1 for any particular value of interest. There may be different ways of doing it, but a conventional way is to assign confidence intervals along the x-axis like this. I would read up on Gaussian curves and normalization before continuing to code it.