1

I'm reading out a text file with some float numbers using np.loadtxt . This is what my numpy array looks like:

x = np.loadtxt(t2)
print(x)

array([[  1.00000000e+00,   6.61560000e-13],
       [  2.00000000e+00,   3.05350000e-13],
       [  3.00000000e+00,   6.22240000e-13],
       [  4.00000000e+00,   3.08850000e-13],
       [  5.00000000e+00,   1.11170000e-10],
       [  6.00000000e+00,   3.82440000e-11],
       [  7.00000000e+00,   5.39160000e-11],
       [  8.00000000e+00,   1.75910000e-11],
       [  9.00000000e+00,   2.27330000e-10]])

I separate out the first column from the second by doing this:

idx, coeffs = zip(*x)

Now, I want to create a mapping of id : coeff, something like this:

mapping = dict(zip(map(int, idx), coeffs))
print(mapping)

{1: 6.6155999999999996e-13,
 2: 3.0535000000000001e-13,
 3: 6.2223999999999998e-13,
 4: 3.0884999999999999e-13,
 5: 1.1117e-10,
 6: 3.8243999999999997e-11,
 7: 5.3915999999999998e-11,
 8: 1.7591e-11,
 9: 2.2733e-10}

As you can see, precision errors have been introduced. For example, 6.61560000e-13 became 6.6155999999999996e-13.

This is what I would like, preferrably:

{1: 6.61560000e-13,
 2: 3.05350000e-13,
 3: 6.22240000e-13,
 4: 3.08850000e-13,
 ...
 }

How can I do this? I am working on IPython3, if that helps.

cs95
  • 379,657
  • 97
  • 704
  • 746
  • Eager downvoter, can you explain what you don't like about this post? – cs95 Aug 15 '17 at 07:31
  • nmdv. Downvoting questions are cheap... back on topic, are you using `float32` when loading your array? because native python uses `double` – Jean-François Fabre Aug 15 '17 at 07:34
  • note that you don't have the rounding issue if x is just a list of lists (python) – Jean-François Fabre Aug 15 '17 at 07:37
  • @Jean-FrançoisFabre No. I just allow numpy to detect the dtype automatically. – cs95 Aug 15 '17 at 07:37
  • @Jean-FrançoisFabre Oh... your comment has given me a great idea! – cs95 Aug 15 '17 at 07:41
  • 2
    No precision errors have actually occurred. You're seeing the limitations of representing a number with a bit pattern, and how the value appears on screen depends on the formatting of the printed value. `format(6.6156000000000000e-13,".16e")` produces '6.6155999999999996e-13'. (Tested with Python 3.6). That's as close as can be to 6.6156e-13, which cannot be represented exactly. – Paul Cornelius Aug 15 '17 at 07:49
  • @PaulCornelius Yes. I understand the limitations of fp representation. My question was pertaining to how to get it to display as numpy does it (figured out how to as well). – cs95 Aug 15 '17 at 07:50
  • OK, not to beat this point to death, but I'm curious now: how can the two instances of `mapping` produce two different printouts, unless their values are actually different? And if 6.6156e-13 can't be represented exactly, how does the printed value get rounded so nicely to 4 decimal places in the second case but not in the first one? The default formatting is, in fact, "6.6156e-13" and in order to get the longer form I had to use the format function. Can you explain what's going on? – Paul Cornelius Aug 15 '17 at 08:01
  • 2
    @PaulCornelius: The key point to understand is that the `numpy.float64` type and the Python `float` type have different `__repr__` algorithms, so the exact same value can get displayed in two different ways. To see this, compare the `repr` of `np.float64(1.1)` with the `repr` of `1.1`. The actual values stored are identical in both cases, but the reprs are different. Python uses David Gay's algorithm, while NumPy uses "compute 17 significant digits then truncate trailing zeros". @cᴏʟᴅsᴘᴇᴇᴅ's solution works because `tolist` also converts NumPy floats to Python floats. – Mark Dickinson Aug 15 '17 at 09:35
  • @Mark Dickinson Very interesting. Thanks for the info. – Paul Cornelius Aug 15 '17 at 21:12

3 Answers3

4

Jean-François Fabre's comment gave me an idea, and I tried it out. Taking into consideration Alexander's suggestion to use a dict comprehension, this worked for me:

x = np.loadtxt(t2)
mapping = {int(k) : v for k, v in x.tolist()}

print (mapping)

Output:

{1: 6.6156e-13,
 2: 3.0535e-13,
 3: 6.2224e-13,
 4: 3.0885e-13,
 5: 1.1117e-10,
 6: 3.8244e-11,
 7: 5.3916e-11,
 8: 1.7591e-11,
 9: 2.2733e-10}

The reason this works is because x is of type np.float64. Calling .tolist() converts x to a list of lists, where each element is of type double. np.float64 and double have different __repr__ implementations. The double uses the David Gay Algorithm to correctly represent these floats, while numpy has a much simpler implementation (mere truncation).

cs95
  • 379,657
  • 97
  • 704
  • 746
  • It may be worth explaining _why_ this works: namely, that the `tolist` method converts NumPy `float64` instances into regular Python `float`s, and the two types have different `repr`s. – Mark Dickinson Aug 15 '17 at 10:39
  • @MarkDickinson Yes. I was aware of the Gay algorithm but did not realise numpy did not implement it. Thank you for your comment. I've added that bit. – cs95 Aug 15 '17 at 10:43
3

Not sure about the downvote.

After entering your data, you have already 'lost precision':

x = np.array([[  1.00000000e+00,   6.61560000e-13],
              [  2.00000000e+00,   3.05350000e-13],
              [  3.00000000e+00,   6.22240000e-13],
              [  4.00000000e+00,   3.08850000e-13],
              [  5.00000000e+00,   1.11170000e-10],
              [  6.00000000e+00,   3.82440000e-11],
              [  7.00000000e+00,   5.39160000e-11],
              [  8.00000000e+00,   1.75910000e-11],
              [  9.00000000e+00,   2.27330000e-10]])

>>> x[0, 1]
6.6155999999999996e-13

Perhaps a simple dict comprehension may be easier:

>>> {int(k): v for k, v in x}
{1: 6.6155999999999996e-13,
 2: 3.0535000000000001e-13,
 3: 6.2223999999999998e-13,
 4: 3.0884999999999999e-13,
 5: 1.1117e-10,
 6: 3.8243999999999997e-11,
 7: 5.3915999999999998e-11,
 8: 1.7591e-11,
 9: 2.2733e-10}
Alexander
  • 105,104
  • 32
  • 201
  • 196
0

Going by your method itself, you can cast your input (float) array to an int array and then construct dictionary after ziping it.

In [44]: dict(zip(np.asarray(x[:,0], dtype=int).tolist(), x[:,1].tolist()))
Out[44]: 
{1: 6.6156e-13,
 2: 3.0535e-13,
 3: 6.2224e-13,
 4: 3.0885e-13,
 5: 1.1117e-10,
 6: 3.8244e-11,
 7: 5.3916e-11,
 8: 1.7591e-11,
 9: 2.2733e-10}

P.S. Using Python 3.6.1 in IPython 6.1.0

kmario23
  • 57,311
  • 13
  • 161
  • 150