I'm using the pandas toolkit in Python, and I'm have an issue.
I have a list of values, lst
, and to make it easy let's say it has only the first 20 natural numbers:
>>> lst = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
I then create a DataFrame
, by giving it a Series
with that list, like this:
>>> df = DataFrame(Series(lst))
And I want to use this to calculate the quantiles from 0.1 (10%) to 1 (100%), and I do it using the quantile
function from DataFrame:
>>> quantiles = df.quantile(np.linspace(.1,1,num=10,endpoint=True))
If I print quantiles
, this is what appears:
0
0.1 2.9
0.2 4.8
0.3 6.7
0.4 8.6
0.5 10.5
0.6 12.4
0.7 14.3
0.8 16.2
0.9 18.1
1.0 20.0
Now, I want to store in a variable the value for quantiles 0.3 and 0.7, and after searching for how to do it I came up with a solution using loc
in the DataFrame
, giving it the quantile label (0.7
, for instance) and the column index of the series of values I want to consider. Since there's only one, I do it like this:
>>> q_3 = qts.loc[0.7][0]
The problem is that python gives me this error:
**KeyError: 'the label [0.7] is not in the [index]'**
But I know it exists, since if I try to print the index
values, I get this:
>>> qts.index
Float64Index([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0], dtype='float64')
So, the index apparently exists, but I says it doesn't. What am I doing wrong?
If I try to print any other quantile value using this approach, rather than 0.3
or 0.7
, it works:
>>> qts.loc[0.1][0]
2.8999999999999999
>>> qts.loc[0.2][0]
4.8000000000000007
>>> qts.loc[0.4][0]
8.6000000000000014
>>> qts.loc[0.5][0]
10.5
>>> qts.loc[0.6][0]
12.4
>>> qts.loc[0.8][0]
16.200000000000003
>>> qts.loc[0.9][0]
18.100000000000001
>>> qts.loc[1][0]
20.0
Any thoughts?
I'm using Python 3.5, and pandas 0.20.3.
EDIT
Thanks for the feedback!
So, it's a float precision issue. Nevertheless, I was wondering: is there a better way to get the N'th element of the list of quantiles, rather than use loc
as I did?