0

I'm trying to plot the CDF for different lists of values I have (contained in different files). Here is the code:

import argparse, os, math
import fileLibrary
import numpy as np
import matplotlib.pyplot as plt

parser = argparse.ArgumentParser()
parser.add_argument('-d', type=str, dest='intervalsDir', help='the directory which contains all the intervals for the CDF')
arguments = parser.parse_args()
intervalsDir = arguments.intervalsDir

def cdf(dataList, groupName):
    dataLen = len(dataList)
    dataSet = sorted(set(dataList))
    bins = np.append(dataSet, dataSet[-1]+1)
    counts, binEdges = np.histogram(dataList, bins=bins, density=False)
    counts = counts.astype(float) / dataLen
    cdf = np.cumsum(counts)
    plt.plot(binEdges[0:-1], cdf, linestyle='--',  color='b')
    plt.ylim((0,1))
    plt.ylabel("CDF")
    plt.xlabel(groupName)
    plt.grid(True)
    plt.show()

for fileName in os.listdir(intervalsDir):
    parsedFileName = fileName.split(".")
    xLabel = parsedFileName[0]
    filePath = intervalsDir + "/" + fileName
    dataList = fileLibrary.createFileList(filePath)
    myDataList = []
    for d in dataList:
        x = int(d)

I am getting the following result (for the first file, the other ones are similar): Result

The lists of values I'm getting the CDF from consists of the majority of values between 0 and 1000. Then I have very fewer bigger ones, and I just have 2 numbers close to the last tick of 250000. Each list contains more than 60000 values. I would like my x axe to have different ticks, to mainly show the smaller values. I'm new with python and matplotlib so I don't know how to properly do that. Thank you in advance for your help.

Gixuna
  • 86
  • 8
  • 1
    Can you replace `plt.plot(binEdges[0:-1], cdf, linestyle='--', color='b')` by `plt.semilogx(binEdges[0:-1], cdf, linestyle='--', color='b')` and see if it helps you? This is basically using logarithmic scale on x-axis. Please try. Alternatively, if you don't want to switch to log scale, you can try a discontinuous x-axis where you show the data only up to 1000 and then close to the last x-values. Here is the link https://stackoverflow.com/questions/32185411/break-in-x-axis-of-matplotlib – Sheldore Aug 20 '18 at 10:14
  • Your first solution was exactly what I wanted, thanks a lot! :) I will check out the second one as well! – Gixuna Aug 20 '18 at 10:19
  • Glad that it worked out for you! You are welcome – Sheldore Aug 20 '18 at 10:20

0 Answers0