Dirty print while printing a list in Python

Question

I'm a Python newbie and I've noticed something strange in such a basilar function as print()

Let the code explain. I would like to save in a list all the outliers of an observation. So I've written the following snippet:

#import numpy as np

def compute_outliers(obs):
    outliers=[]

    q1 = np.percentile(obs, 25)
    q3 = np.percentile(obs, 75)
    iqr = q3 - q1
    print('q1: ', q1)
    print('q3: ', q3)
    lower_limit = q1 - 1.5 * iqr
    upper_limit = q3 + 1.5 * iqr

    for i in obs:
        if i < lower_limit or i > upper_limit:
            outliers.append(i)
    return outliers

outliers = compute_outliers(data)

Where data is a general feature (in the sense of "column") of a DataFrame object, from pandas library.

Now, if I tape

for i in outliers:
    print(i)

The outputi is ok:

20.0
0.0
17.6
2.7
18.9
0.0
18.0

While, if I type:

print(outliers)

This is the output:

[20.0, 0.0, 17.600000000000001, 2.7000000000000002, 18.899999999999999, 0.0, 18.0]

You can see the values (the third, the fourth, the fifth) are 'dirty'. I should simply use the first code for printing, but I'm curoius about how all of this works, so I would like to know WHY this happens.

EDIT

I think that to complete the question would be useful to know how to 'fix' this issue, so printing the list of right values. Could you help?

score 3 · Answer 1 · edited May 23 '17 at 12:25

3

This effect is results from a combination of these facts:

A list is a container type.
print(foo) uses str(foo), which calls foo.__str__().
A container’s __str__ uses contained objects’ __repr__.
Decimal fractions are not always precisely representable by binary floating point numbers.
float.__str__() rounds to make the decimal representation look nice, while float.__repr__() tries to preserve as much precision as feasible.

edited May 23 '17 at 12:25

Community

1
1

answered Mar 18 '17 at 22:28

das-g

9,718
4
38
80

Useful answer, could you also take a look to the edit? – Bernheart Mar 18 '17 at 23:13
@Bernheart I don't see any edit. Maybe it's already been rejected before I had a look? – das-g Mar 19 '17 at 10:47

score 1 · Accepted Answer · edited May 23 '17 at 12:09

Yeah, it's a well-known floating point issues and some trickery with repr and str in Python.

If you use Python 2, you can try this:

print(0.1 + 0.2)
# 0.3
print([0.1 + 0.2])
# [0.30000000000000004]

This is because 0.1 + 0.2 is in fact not equal to 0.3 in IEEE 754 floating point numbers. This is due to 0.1 is not 1/10 as the latter cannot be written as finite binary floating point number at all.

When you invoke print on a number, it uses str() for that number. str() is a representation that aims on readability and it can omit some "insignificant" digits to make number more readable.

On the other hand, when you print a list, an algorithm to stringify that list uses repr for every item. repr() aims at exactness and reproducibility, so it provides all digits that are needed to reconstruct the number. It does not mean that it uses all the digits (e.g. repr(0.1) is still "0.1", not "0.1000000000000000055511151" that can be obtained by print("%.25f" % 0.1)), but it can use more digits then str do.

EDIT. If you want more user-friendly output when print a list, you can do it manually with something like:

print(", ".join("{:.2f}".format(x) for x in outliers))

See also this thread for different approaches and this site for more formatting options.

Hi Ilya and thank you for your explanaition, that is quite deep for my knowledge: look at the edit :) — Bernheart, Mar 18 '17 at 23:12

Dirty print while printing a list in Python

2 Answers2