Fix precision issues when displaying floats in python

Question

I'm reading out a text file with some float numbers using np.loadtxt . This is what my numpy array looks like:

x = np.loadtxt(t2)
print(x)

array([[  1.00000000e+00,   6.61560000e-13],
       [  2.00000000e+00,   3.05350000e-13],
       [  3.00000000e+00,   6.22240000e-13],
       [  4.00000000e+00,   3.08850000e-13],
       [  5.00000000e+00,   1.11170000e-10],
       [  6.00000000e+00,   3.82440000e-11],
       [  7.00000000e+00,   5.39160000e-11],
       [  8.00000000e+00,   1.75910000e-11],
       [  9.00000000e+00,   2.27330000e-10]])

I separate out the first column from the second by doing this:

idx, coeffs = zip(*x)

Now, I want to create a mapping of id : coeff, something like this:

mapping = dict(zip(map(int, idx), coeffs))
print(mapping)

{1: 6.6155999999999996e-13,
 2: 3.0535000000000001e-13,
 3: 6.2223999999999998e-13,
 4: 3.0884999999999999e-13,
 5: 1.1117e-10,
 6: 3.8243999999999997e-11,
 7: 5.3915999999999998e-11,
 8: 1.7591e-11,
 9: 2.2733e-10}

As you can see, precision errors have been introduced. For example, 6.61560000e-13 became 6.6155999999999996e-13.

This is what I would like, preferrably:

{1: 6.61560000e-13,
 2: 3.05350000e-13,
 3: 6.22240000e-13,
 4: 3.08850000e-13,
 ...
 }

How can I do this? I am working on IPython3, if that helps.

Eager downvoter, can you explain what you don't like about this post? — cs95, Aug 15 '17 at 07:31
nmdv. Downvoting questions are cheap... back on topic, are you using `float32` when loading your array? because native python uses `double` — Jean-François Fabre, Aug 15 '17 at 07:34
note that you don't have the rounding issue if x is just a list of lists (python) — Jean-François Fabre, Aug 15 '17 at 07:37
@Jean-FrançoisFabre No. I just allow numpy to detect the dtype automatically. — cs95, Aug 15 '17 at 07:37
@Jean-FrançoisFabre Oh... your comment has given me a great idea! — cs95, Aug 15 '17 at 07:41
No precision errors have actually occurred. You're seeing the limitations of representing a number with a bit pattern, and how the value appears on screen depends on the formatting of the printed value. `format(6.6156000000000000e-13,".16e")` produces '6.6155999999999996e-13'. (Tested with Python 3.6). That's as close as can be to 6.6156e-13, which cannot be represented exactly. — Paul Cornelius, Aug 15 '17 at 07:49
@PaulCornelius Yes. I understand the limitations of fp representation. My question was pertaining to how to get it to display as numpy does it (figured out how to as well). — cs95, Aug 15 '17 at 07:50
OK, not to beat this point to death, but I'm curious now: how can the two instances of `mapping` produce two different printouts, unless their values are actually different? And if 6.6156e-13 can't be represented exactly, how does the printed value get rounded so nicely to 4 decimal places in the second case but not in the first one? The default formatting is, in fact, "6.6156e-13" and in order to get the longer form I had to use the format function. Can you explain what's going on? — Paul Cornelius, Aug 15 '17 at 08:01
@PaulCornelius: The key point to understand is that the `numpy.float64` type and the Python `float` type have different `__repr__` algorithms, so the exact same value can get displayed in two different ways. To see this, compare the `repr` of `np.float64(1.1)` with the `repr` of `1.1`. The actual values stored are identical in both cases, but the reprs are different. Python uses David Gay's algorithm, while NumPy uses "compute 17 significant digits then truncate trailing zeros". @cᴏʟᴅsᴘᴇᴇᴅ's solution works because `tolist` also converts NumPy floats to Python floats. — Mark Dickinson, Aug 15 '17 at 09:35

cs95 · Accepted Answer · 2017-08-15T10:42:37.380

4

Jean-François Fabre's comment gave me an idea, and I tried it out. Taking into consideration Alexander's suggestion to use a dict comprehension, this worked for me:

x = np.loadtxt(t2)
mapping = {int(k) : v for k, v in x.tolist()}

print (mapping)

Output:

{1: 6.6156e-13,
 2: 3.0535e-13,
 3: 6.2224e-13,
 4: 3.0885e-13,
 5: 1.1117e-10,
 6: 3.8244e-11,
 7: 5.3916e-11,
 8: 1.7591e-11,
 9: 2.2733e-10}

The reason this works is because x is of type np.float64. Calling .tolist() converts x to a list of lists, where each element is of type double. np.float64 and double have different __repr__ implementations. The double uses the David Gay Algorithm to correctly represent these floats, while numpy has a much simpler implementation (mere truncation).

edited Aug 15 '17 at 10:42

answered Aug 15 '17 at 07:42

cs95

379,657
97
704
746

It may be worth explaining _why_ this works: namely, that the `tolist` method converts NumPy `float64` instances into regular Python `float`s, and the two types have different `repr`s. – Mark Dickinson Aug 15 '17 at 10:39
@MarkDickinson Yes. I was aware of the Gay algorithm but did not realise numpy did not implement it. Thank you for your comment. I've added that bit. – cs95 Aug 15 '17 at 10:43

Alexander · Answer 2 · 2017-08-15T07:39:09.050

3

Not sure about the downvote.

After entering your data, you have already 'lost precision':

x = np.array([[  1.00000000e+00,   6.61560000e-13],
              [  2.00000000e+00,   3.05350000e-13],
              [  3.00000000e+00,   6.22240000e-13],
              [  4.00000000e+00,   3.08850000e-13],
              [  5.00000000e+00,   1.11170000e-10],
              [  6.00000000e+00,   3.82440000e-11],
              [  7.00000000e+00,   5.39160000e-11],
              [  8.00000000e+00,   1.75910000e-11],
              [  9.00000000e+00,   2.27330000e-10]])

>>> x[0, 1]
6.6155999999999996e-13

Perhaps a simple dict comprehension may be easier:

>>> {int(k): v for k, v in x}
{1: 6.6155999999999996e-13,
 2: 3.0535000000000001e-13,
 3: 6.2223999999999998e-13,
 4: 3.0884999999999999e-13,
 5: 1.1117e-10,
 6: 3.8243999999999997e-11,
 7: 5.3915999999999998e-11,
 8: 1.7591e-11,
 9: 2.2733e-10}

edited Aug 15 '17 at 07:39

answered Aug 15 '17 at 07:37

Alexander

105,104
32
201
196

So it all comes back to the Gay algorithm for displaying floats. There is no way to fix this, correct? – cs95 Aug 15 '17 at 07:38
I figured out a way, using `.tolist()`. Take a look at my answer! – cs95 Aug 15 '17 at 07:43
That seems to have done the trick. Not sure what benefit `dtype=np.float64` adds? – Alexander Aug 15 '17 at 07:46
Lol... you're right. Was a remnant of some earlier attempts, but it seems to work without. – cs95 Aug 15 '17 at 07:47

score 0 · Answer 3 · answered Aug 15 '17 at 08:30

Going by your method itself, you can cast your input (float) array to an int array and then construct dictionary after ziping it.

In [44]: dict(zip(np.asarray(x[:,0], dtype=int).tolist(), x[:,1].tolist()))
Out[44]: 
{1: 6.6156e-13,
 2: 3.0535e-13,
 3: 6.2224e-13,
 4: 3.0885e-13,
 5: 1.1117e-10,
 6: 3.8244e-11,
 7: 5.3916e-11,
 8: 1.7591e-11,
 9: 2.2733e-10}

P.S. Using Python 3.6.1 in IPython 6.1.0

Fix precision issues when displaying floats in python

3 Answers3

Linked

Fix precision issues when *displaying* floats in python

3 Answers3

Linked

Fix precision issues when displaying floats in python