Python pandas: issues when subsetting Series using .values attribute

Question

I'm having an issue with Pandas Series: I've created an array with some values in it. For testing puroposes I was trying to make sure of the presence of certain values in the Series, so I'm subsetting it like in what's below:

A = np.arange(start=-10, stop=10, step=0.1)
Aseries = pd.Series(A) 
Aseries[Aseries.values == 9]

and this returns me an empty array. But I just have to change the step (from 0.1 to 1) and then it works... I've double checked that the Series actually contains the value I'm looking for (for both steps values...)

Here's the code for when I change the step (With the output as proof)

#Generating an array conaining 200 values from -10 to 10 with a step of 0.1
A = np.arange(start=-10, stop=10, step=0.1)
Aseries = pd.Series(A)
Aseries[Aseries.values == 9]

#Generating an array conaining 20 values from -10 to 10 with a step of 0.1
B = np.arange(start=-10, stop=10, step=1)
Bseries = pd.Series(B)

print("'Aseries' having the value 9:")
print(Aseries[Aseries.values == 9])
print("'Bseries' having the value 9:")
print(Bseries[Bseries.values == 9])

output:

'Aseries' having the value 9:
Series([], dtype: float64)
'Bseries' having the value 9:
19    9
dtype: int32

any idea of what's going on here? thanks in advance!

[EDIT]: for some reason I can't add any other post to this thread, so I'll add the solution I found here: like @Quang Hoang and @Kim Rop explained by the non integer step value which doesnt really returns what it's supposed to. So after:

Aseries = pd.Series(A)

I simply added a rounding instruction to only keep one decimal after the values in the array and adapted my subsetting operation with something like that:

Aseries[(Aseries.values < 9.1) &(Aseries.values < 9.1)]

I'm not having the issue anymore... Thanks @Quang Hoang and @Kim Rop

I think for the steps 0.1 then linespace is best check https://stackoverflow.com/questions/477486/how-to-use-a-decimal-range-step-value — Kim Rop, Nov 05 '20 at 21:30

score 1 · Accepted Answer · answered Nov 05 '20 at 21:22

1

According to the offical document:

When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use numpy.linspace for these cases.

And this is also partially because of floating point precision.

answered Nov 05 '20 at 21:22

Quang Hoang

146,074
10
56
74

`[print(x, x == 9, np.isclose(x, 9)) for x in Aseries.values]` this example might help to reproduce the behavior mentioned in this answer. – mlang Nov 05 '20 at 21:30

Python pandas: issues when subsetting Series using .values attribute

1 Answers1