I'm a Python newbie and I've noticed something strange in such a basilar function as print()
Let the code explain. I would like to save in a list all the outliers of an observation. So I've written the following snippet:
#import numpy as np
def compute_outliers(obs):
outliers=[]
q1 = np.percentile(obs, 25)
q3 = np.percentile(obs, 75)
iqr = q3 - q1
print('q1: ', q1)
print('q3: ', q3)
lower_limit = q1 - 1.5 * iqr
upper_limit = q3 + 1.5 * iqr
for i in obs:
if i < lower_limit or i > upper_limit:
outliers.append(i)
return outliers
outliers = compute_outliers(data)
Where data is a general feature (in the sense of "column") of a DataFrame object, from pandas library.
Now, if I tape
for i in outliers:
print(i)
The outputi is ok:
20.0
0.0
17.6
2.7
18.9
0.0
18.0
While, if I type:
print(outliers)
This is the output:
[20.0, 0.0, 17.600000000000001, 2.7000000000000002, 18.899999999999999, 0.0, 18.0]
You can see the values (the third, the fourth, the fifth) are 'dirty'. I should simply use the first code for printing, but I'm curoius about how all of this works, so I would like to know WHY this happens.
EDIT
I think that to complete the question would be useful to know how to 'fix' this issue, so printing the list of right values. Could you help?