1

I need to draw cdf of integer values read from a file. I am following the example here. I am not sure how I can normalize the data for pdf and then compute cdf.

import numpy as np
from pylab import *

with open ("D:/input_file.txt", "r+") as f:
    data = f.readlines()
    X = [int(line.strip()) for line in data]
    Y  = exp([-x**2 for x in X])  # is this correct? 

    # Normalize the data to a proper PDF
    Y /= ... # not sure what to write here

    # Compute the CDF
    CY = ... # not sure what to write here

    # Plot both
    plot(X,Y)
    plot(X,CY,'r--')

    show()
SaadH
  • 1,158
  • 2
  • 23
  • 38

1 Answers1

1

I can propose an answer, where you determine probability density function (PDF) and cumulative distribution function (CDF) using NumPy.

import numpy as np
# -----------------
data = [88,93,184,91,107,170,88,107,167,90];
# -----------------
# get PDF:
ydata,xdata = np.histogram(data,bins=np.size(data),normed=True);
# ----------------
# get CDF:
cdf = np.cumsum(ydata*np.diff(xdata));
# -----------------
print 'Sum:',np.sum(ydata*np.diff(xdata))

I am using Numpy method histogram, which will give me the PDF and then I will calculate CDF from PDF.

msi_gerva
  • 2,021
  • 3
  • 22
  • 28
  • How do I draw the pdf and cdf from here? `plt.plot(xdata,ydata)` throws an error:`x and y must have same first dimension, but have shapes (11L,) and (10L,)` – SaadH Sep 28 '18 at 15:15
  • Yes, they are different size as xdata is 1 larger. This is associated with `np.histogram` method, which for x-coordinates gives the start and end coordinate for the bar. If you want to draw the figure with plot, I would use center points of the start and end. Basically `xplot = 0.5*(xdata[0:-1]+xdata[1:])` and `plot(xplot,ydata)` – msi_gerva Sep 28 '18 at 16:37
  • The figure seems to be coming up now, but I don't think the pdf values along the y-axis are correct. They seem lower than the correct ones. e.g. if data = [70,70,90,90] , the y-axis value should be 0.5 for x = 70 , and same for x = 90, but the graph is showing pdf value as 0.1 for x = 70 and 90. – SaadH Oct 02 '18 at 23:51
  • Yes, you are right. The value 0.1 is the normed value e.g. the value for `dy/dx`. If you want to have the value 0.5, you have to multiply it by `dx` i.e. `ydata*np.diff(xdata)`. Or simply use keyword `normed=False` and normalize the `ydata` with the size of the data i.e. `ydata,xdata = np.histogram(data,bins=np.size(data),normed=True);ydata = ydata/np.size(data)`. Hope this helps. – msi_gerva Oct 03 '18 at 08:48