I'm trying to plot the CDF for different lists of values I have (contained in different files). Here is the code:
import argparse, os, math
import fileLibrary
import numpy as np
import matplotlib.pyplot as plt
parser = argparse.ArgumentParser()
parser.add_argument('-d', type=str, dest='intervalsDir', help='the directory which contains all the intervals for the CDF')
arguments = parser.parse_args()
intervalsDir = arguments.intervalsDir
def cdf(dataList, groupName):
dataLen = len(dataList)
dataSet = sorted(set(dataList))
bins = np.append(dataSet, dataSet[-1]+1)
counts, binEdges = np.histogram(dataList, bins=bins, density=False)
counts = counts.astype(float) / dataLen
cdf = np.cumsum(counts)
plt.plot(binEdges[0:-1], cdf, linestyle='--', color='b')
plt.ylim((0,1))
plt.ylabel("CDF")
plt.xlabel(groupName)
plt.grid(True)
plt.show()
for fileName in os.listdir(intervalsDir):
parsedFileName = fileName.split(".")
xLabel = parsedFileName[0]
filePath = intervalsDir + "/" + fileName
dataList = fileLibrary.createFileList(filePath)
myDataList = []
for d in dataList:
x = int(d)
I am getting the following result (for the first file, the other ones are similar):
The lists of values I'm getting the CDF from consists of the majority of values between 0 and 1000. Then I have very fewer bigger ones, and I just have 2 numbers close to the last tick of 250000. Each list contains more than 60000 values. I would like my x axe to have different ticks, to mainly show the smaller values. I'm new with python and matplotlib so I don't know how to properly do that. Thank you in advance for your help.