0

I am trying to fetch multiple years of data (e.g., from 2005-2007) from a database that contains values in float format.

2005 values (sorted): [0.512, 0.768, 1, 1.5..., 100]

2006 values (sorted): [0.288, 0.512..., 300, 350] and so on.

I want to generate a CDF plot (using ax.hist() function) that enables me to plot each year into a single graph. My current code looks like this:

num_bins = 100
fig, ax = plt.subplots(figsize=(8, 4))
years = ['2005', '2006', '2007']

for year in years:
    df = pd.read_sql_query(query, conn) #sorted
    n, bins, patches = ax.hist(df.values, num_bins, normed = 1, histtype='step', cumulative=True, label=str(year))

ax.grid(True)
ax.legend(loc='right')
ax.set_xlabel('Values')
ax.set_ylabel('CDF plot')
plt.show()

However, this gives me a single plot with multiple CDF histograms but varying x-axis (unsorted). My x-axis values are: 0.512, 0.768, 1, 1.5..., 100, 0.288, ..., 300, 350. It appends the newly found values in the second year to the first year x-axis values instead of re-plotting using the same scale.

How can I ensure that all CDF plots get generated for a common and dynamically varying scale (end-result) such as: 0.288, 0.512, 0.768, 1, 1.5..., 100..., 300, 350.

Rarblack
  • 4,559
  • 4
  • 22
  • 33
Rg90
  • 581
  • 4
  • 10
  • 28
  • Well you don't seem to be subsetting your `DataFrame` in any way; the only thing you are changing within the loop on each iteration is the label. Also, though it's a personal preference, I'd suggest using your own function, which is easy to implement like [this](https://stackoverflow.com/questions/24788200/calculate-the-cumulative-distribution-function-cdf-in-python). A CDF really shouldn't be aliased by the bins, which happens when you use a cumulative histogram. – ALollz Oct 03 '18 at 13:52

1 Answers1

0

I found the problem:

 df = pd.read_sql_query(query, conn) #sorted

This was returning me a data frame with df.values as an array of sorted strings and not numbers. After formatting this to numbers, the plots generated correctly.

Rg90
  • 581
  • 4
  • 10
  • 28