31

I am maintaining a Python script that uses xlrd to retrieve values from Excel spreadsheets, and then do various things with them. Some of the cells in the spreadsheet are high-precision numbers, and they must remain as such. When retrieving the values of one of these cells, xlrd gives me a float such as 0.38288746115497402.

However, I need to get this value into a string later on in the code. Doing either str(value) or unicode(value) will return something like "0.382887461155". The requirements say that this is not acceptable; the precision needs to be preserved.

I've tried a couple things so far to no success. The first was using a string formatting thingy:

data = "%.40s" % (value) 
data2 = "%.40r" % (value) 

But both produce the same rounded number, "0.382887461155".

Upon searching around for people with similar problems on SO and elsewhere on the internet, a common suggestion was to use the Decimal class. But I can't change the way the data is given to me (unless somebody knows of a secret way to make xlrd return Decimals). And when I try to do this:

data = Decimal(value)

I get a TypeError: Cannot convert float to Decimal. First convert the float to a string. But obviously I can't convert it to a string, or else I will lose the precision.

So yeah, I'm open to any suggestions -- even really gross/hacky ones if necessary. I'm not terribly experienced with Python (more of a Java/C# guy myself) so feel free to correct me if I've got some kind of fundamental misunderstanding here.

EDIT: Just thought I would add that I am using Python 2.6.4. I don't think there are any formal requirements stopping me from changing versions; it just has to not mess up any of the other code.

jloubert
  • 974
  • 2
  • 8
  • 12
  • Can you change the format of the spreadsheet? – Katriel Aug 13 '10 at 23:42
  • 1
    try using py2.7 and repr() instead – arthurprs Aug 13 '10 at 23:43
  • @katrielalex: Probably not -- but what exactly did you have in mind? @arthurprs: Thanks, I'll check out 2.7. However, doesn't my example above with `data2 = "%.40r" % (value)` already implicitly call repr()? Unless repr() changed between 2.6.3 and 2.7... – jloubert Aug 13 '10 at 23:51
  • Well, you could do something silly like store the number as `0.38288746` and `115497402` in separate columns. – Katriel Aug 14 '10 at 00:07
  • @katrielalex Hmm, I think the values are computed from other equally-precise values elsewhere in the spreadsheet via some formulae. So splitting them up into seperate cells wouldn't work out too well. However (as I've also noted below), it appears that Python 2.7's implementation of repr() solves the problem and doesn't round. – jloubert Aug 14 '10 at 00:21
  • 1
    @jloubert: that's fine, it's only reading the final values that's the problem. It would be some nasty hacking in Excel, but perfectly possible. You should make sure `repr` works on a few test cases; remember that this isn't an issue with `repr` but with the innate limitations of floats on a computer. I have a feeling you may just be being lucky with `repr`, although I could be wrong! – Katriel Aug 14 '10 at 00:25
  • @katrielalex Alright, thanks for the heads up! I'll be sure to try this out on a few different machines when I come back to work on Monday. And yes, I know it would be possible in Excel - it's just an additional headache I would prefer to avoid, especially since the design of the worksheet isn't really my responsibility. I would have to convince others to do all that nasty hacking, and I don't think they would enjoy that :/ – jloubert Aug 14 '10 at 00:35
  • 3
    @jloubert: There's no need for Python 2.7. Yes, both the Python 2.7 and Python 2.6 (and earlier) implementations display a rounded value with repr, in the sense that they're not showing the exact value stored in the machine. But in both cases, that rounded value is deliberately chosen to be accurate enough that 'float(repr(x))' recovers x exactly. (I'm one of the implementers of the Python 2.7 float repr, so I have some idea what I'm talking about here. :) +1 for @John Machin's answer. – Mark Dickinson Aug 14 '10 at 07:49
  • @jloubert: What does "Some of the cells in the spreadsheet are high-precision numbers" really mean? What exactly do your requirements say? What is their definition of "precision"? How is the string representation of the float going to be used, other than for display? – John Machin Aug 14 '10 at 23:14
  • @John: The main task right now is to just pull data from spreadsheets, format them in a very specific way, then write that output to some files. So basically all that needs to be done is to convert the numbers to strings and output them. The resulting text files need to exactly reproduce the data in the spreadsheets. Many cells have numbers with 17 digits after the decimal; these must then be converted to strings with all 17 digits intact. This task may be extended soon to do more work with the data, which is why its nice to be able to keep the original `floats` around, but for now, that's it. – jloubert Aug 19 '10 at 21:02
  • @jloubert: Sorry, but I don't understand what you are trying to tell me. What is "very specific way"? Will you use repr() or not? What do you mean by "many cells have numbers with 17 digits after the decimal [point]"? Number cells are recorded as IEEE 754 64-bit binary floats (same as Python). They don't intrinsically have ANY decimal digits, after OR BEFORE the decimal point. Max precision is 53 bits, equivalent to 15.95 decimal digits. Python's repr() for v2.x and v3.x will exactly reproduce the data. I don't understand "keep original floats around"; keep where? why? – John Machin Aug 19 '10 at 23:29
  • I think you're imagining it to be more complex than it is. We have an application that reads text files as input. There are spreadsheets containing the data we want to use. Therefore the spreadsheets' data has to be turned into text files, with delimiters, row/column names, and various other formatting included. I know basically nothing about the internal storage methods of Excel; all I know is that I see cells containing a zero, a decimal point, and then 17 digits. When getting the `cell.value` of these, a `float` was returned. PythonWin showed this `float` as having the correct value. (cont) – jloubert Aug 20 '10 at 20:19
  • However, using the methods I described in the OP, I was unable to convert that accurate `float` into a string representation without the data changing; the number would get rounded. I upgraded to 2.7 and `repr()` returns a string that is identical to the value in the Excel cell, which solves the problem I had. The other thing I was getting at was that in the future I will likely have to do other operations with the data, which is where I will need the numbers as `floats`. I really shouldn't have mentioned this because it is still poorly defined and has only caused communication difficulties =/ – jloubert Aug 20 '10 at 20:49
  • @jloubert: I'm steadfastly refusing to use my imagination; that's why I've been asking you so many questions, like please show evidence to back up your assertion that Python 2.6 repr(some_float) "changes" or "rounds" values. – John Machin Aug 21 '10 at 06:22

5 Answers5

56

I'm the author of xlrd. There is so much confusion in other answers and comments to rebut in comments so I'm doing it in an answer.

@katriealex: """precision being lost in the guts of xlrd""" --- entirely unfounded and untrue. xlrd reproduces exactly the 64-bit float that's stored in the XLS file.

@katriealex: """It may be possible to modify your local xlrd installation to change the float cast""" --- I don't know why you would want to do this; you don't lose any precision by floating a 16-bit integer!!! In any case that code is used only when reading Excel 2.X files (which had an INTEGER-type cell record). The OP gives no indication that he is reading such ancient files.

@jloubert: You must be mistaken. "%.40r" % a_float is just a baroque way of getting the same answer as repr(a_float).

@EVERYBODY: You don't need to convert a float to a decimal to preserve the precision. The whole point of the repr() function is that the following is guaranteed:

float(repr(a_float)) == a_float

Python 2.X (X <= 6) repr gives a constant 17 decimal digits of precision, as that is guaranteed to reproduce the original value. Later Pythons (2.7, 3.1) give the minimal number of decimal digits that will reproduce the original value.

Python 2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit (Intel)] on win32
>>> f = 0.38288746115497402
>>> repr(f)
'0.38288746115497402'
>>> float(repr(f)) == f
True

Python 2.7 (r27:82525, Jul  4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)] on win32
>>> f = 0.38288746115497402
>>> repr(f)
'0.382887461154974'
>>> float(repr(f)) == f
True

So the bottom line is that if you want a string that preserves all the precision of a float object, use preserved = repr(the_float_object) ... recover the value later by float(preserved). It's that simple. No need for the decimal module.

John Machin
  • 81,303
  • 11
  • 141
  • 189
  • @John: thank you very much for the clarification, I have changed my answer to mention yours! However -- I have just rechecked this on my system and it still does not reproduce the functionality that I believe the OP wanted. Specifically, if I define `f = 0.38...97402` and then call `repr(f)`, I get `0.38...974`. Cast to a float, this number compares equal to `f` (they differ by less than my epsilon), but clearly does not maintain the full precision of the original definition. If I do `d = decimal.Decimal( "0.38...97402" )`, full precision is returned with `str(d)`. Have I missed something? – Katriel Aug 14 '10 at 12:22
  • Oh, I think I get it. Is the point you're making that Excel internally uses `double`s, same as Python, so that the apparent higher precision is an illusion? – Katriel Aug 14 '10 at 12:31
  • @katriealex: What Excel uses internally is a bit of a mystery, but in one of Bill Kahan's delicious diatribes (http://www.cs.berkeley.edu/~wkahan/Mindless.pdf) he shows that Excel appears to use "doubles" but stuffs about trying to pretend it's using 15-digit floating decimal arithmetic. However what matters is that number cells in an XLS file contain 64-bit IEEE floats (or a crufty compressed version thereof) and xlrd drags those out with no loss of precision. I thought the non-existent precision difference between the OP's 2.6 repr and your 2.7 repr had already been explained. – John Machin Aug 14 '10 at 13:29
  • 1
    John: yes, repr() is sufficient if you want to just print the precise value. However, I think the comments on converting to Decimal would still be relevant if you want to do additional calculations with the numbers with a higher level of precision. For example, if you read a bunch of rows with float values and then use float arithmetic to sum them you could get rounding errors. Converting those to Decimals before summing would prevent that. – Matt Good Aug 16 '10 at 22:44
  • @Matt Good: The OP has not responded to a request to say what he wants to do with the non-float representations of his "high precision numbers". *IF* the problem was loss of precision on summation, then one could use Kahan's (that man again!) accurate summation algorithm. What higher degree of precision? The numbers can't be made any more precise than they are already. – John Machin Aug 16 '10 at 23:04
  • Thanks for the clear response, John. I am 95% sure using `repr()` in Python 2.6.4 (both explicitly and via string formatting) still provided a rounded number. But it works as you say in 2.7, and my higher-ups are fine with upgrading versions, so I won't ask any more questions :) Matt Good is correct in that converting to `Decimals` will be necessary for further calculations, but that is an area in the software requirements that is still up in the air. I only need strings for the time being. – jloubert Aug 16 '10 at 23:58
  • 1
    @jloubert: "95% sure"??? It either does or it doesn't! Don't walk away from a problem! Produce evidence; repeat my Python 2.6.4 example and show what you get. Show (as I did) the Python banner line so that we can see exactly what Python implementation is "producing a rounded answer", if that happens. And a quick silly question: how did you know xlrd was producing `0.38288746115497402` without using `repr()`??? – John Machin Aug 17 '10 at 00:21
  • Well, I say 95% sure because I was relying on the debugging features of PythonWin, not directly typing lines into an interpreter. This debugger showed that the `floats` produced by `xlrd` were indeed accurate, and that the precision was lost upon using `str()` or `unicode()` or `repr()`. I'll see if I can repeat the results with 2.6.4 when I have some time later today. – jloubert Aug 17 '10 at 18:40
  • 1
    @jloubert: Still waiting for you to check your assertion. I don't understand why PythonWin debugger (itself an interpreter) is more reliable than typing lines into the official Python interpreter. Accurate when compared with what? Please do take a minute or two to run `python -c"f=0.38288746115497402;print repr(f), float(repr(f))==f"` at the command prompt using Python 2.6.4 – John Machin Aug 19 '10 at 23:39
  • Note that the identity doesn't work for numpy's lower-precision floats. Check out `import numpy; b = numpy.array([1.0 / 3.0], dtype=np.float16); float(repr(b[0])) - b[0]`. (Same for `float32`.) – Nico Schlömer Mar 09 '17 at 10:21
3

You can use repr() to convert to a string without losing precision, then convert to a Decimal:

>>> from decimal import Decimal
>>> f = 0.38288746115497402
>>> d = Decimal(repr(f))
>>> print d
0.38288746115497402
eldarerathis
  • 35,455
  • 10
  • 90
  • 93
  • doc says that it's only "guaranteed" in 2.7+ (and probably 3.1+) – arthurprs Aug 13 '10 at 23:48
  • I tried using repr() explicitly in 2.6.4, and it still rounded. I'm getting version 2.7 right now though to try that out. – jloubert Aug 13 '10 at 23:53
  • Hm, I'm actually not clear on what the deal is with `repr()`. It's definitely in the 2.6 docs: http://docs.python.org/release/2.6/library/functions.html?highlight=repr#repr – eldarerathis Aug 13 '10 at 23:55
  • Rounds in 3.1 and 2.6 for me. As expected, because this is under `sys.float_info.epsilon` on my system. – Katriel Aug 13 '10 at 23:57
  • Ugh, this is well within my epsilon, so I guess I can't be much more helpful. Anything I test probably won't be portable. – eldarerathis Aug 14 '10 at 00:02
  • doing that and later `str(d)` on it works for me. I'm on 2.6.5 according to `--version`. – avacariu Aug 14 '10 at 00:03
  • @vlad003: Well, what does `sys.float_info.epsilon` report for you? – eldarerathis Aug 14 '10 at 00:04
  • Alright, so repr() appears to do the trick. In Python 2.7 anyway. What is the deal with epsilons though? I mean I know what they are, but are they different from machine to machine? And if so, how can they be changed? – jloubert Aug 14 '10 at 00:17
  • @eldarerathis: 2.2204460492503131e-16 – avacariu Aug 14 '10 at 00:31
  • @jloubert: I *thought* that it was defined in the `` header somewhere, and was standardized by the IEEE. I haven't found anything that indicates a different answer yet, but I'm left to wonder what kind of system setup @katrielalex has that causes him to have a larger epsilon. Python just uses the value of `DBL_EPSILON` on your system, AFAIK. – eldarerathis Aug 14 '10 at 00:55
  • 1
    **decimal is NOT needed. epsilons are irrelevant. xlrd doesn't lose precision. don't hack xlrd code. just use repr(). See my answer** – John Machin Aug 14 '10 at 01:11
  • @arthurprs: Hmm. You're right. That sentence ("... repr() of a floating-point number x returns a result that’s guaranteed to round back to the same number when converted back to a string.") *is* rather misleading, since it suggests that this is a new feature when it's not. I'll fix it. Thanks! – Mark Dickinson Aug 15 '10 at 07:40
1

EDIT: I am wrong. I shall leave this answer here so the rest of the thread makes sense, but it's not true. Please see John Machin's answer above. Thanks guys =).

If the above answers work that's great -- it will save you a lot of nasty hacking. However, at least on my system, they won't. You can check this with e.g.

import sys
print( "%.30f" % sys.float_info.epsilon )

That number is the smallest float that your system can distinguish from zero. Anything smaller than that may be randomly added or subtracted from any float when you perform an operation. This means that, at least on my Python setup, the precision is lost inside the guts of xlrd, and there seems to be nothing you can do without modifying it. Which is odd; I'd have expected this case to have occurred before, but apparently not!

It may be possible to modify your local xlrd installation to change the float cast. Open up site-packages\xlrd\sheet.py and go down to line 1099:

...
elif rc == XL_INTEGER:
                    rowx, colx, cell_attr, d = local_unpack('<HH3sH', data)
                    self_put_number_cell(rowx, colx, float(d), self.fixed_BIFF2_xfindex(cell_attr, rowx, colx))
...

Notice the float cast -- you could try changing that to a decimal.Decimal and see what happens.

Katriel
  • 120,462
  • 19
  • 136
  • 170
  • Thanks for the answer! Obviously I'd prefer to avoid editing the `xlrd` library, as that will undoubtedly get silly; however, I'll definitely give this a shot if nothing else ends up working. – jloubert Aug 13 '10 at 23:58
  • 1
    @katriealex: Can you explain exactly what 'doesn't work' on your system? The suggestions to use `repr` should work fine. There's some misinformation here. `sys.float_info.epsilon` is *not* the smallest float that can be distinguished from zero; that value would be around `5e-324` on most machines. (Though IronPython currently does actually have this wrong value for sys.float_info.epsilon, I believe.) And the 'randomly added or subtracted' isn't really helpful either, and is a long way from describing what actually happens. – Mark Dickinson Aug 14 '10 at 08:02
  • @Mark: I have explained in a comment on your post (I'm not sure why =)). – Katriel Aug 14 '10 at 12:23
0

EDIT: Cleared my previous answer b/c it didn't work properly.

I'm on Python 2.6.5 and this works for me:

a = 0.38288746115497402
print repr(a)
type(repr(a))    #Says it's a string

Note: This just converts to a string. You'll need to convert to Decimal yourself later if needed.

avacariu
  • 2,780
  • 3
  • 25
  • 25
  • In Python 2.6.4, string._float() is still rounding. What version are you using? – jloubert Aug 13 '10 at 23:55
  • Rewritten my answer. I think that `string._float` is actually still a float. (I don't know why it's in the string module, but that's probably why there's an `_` at the beginning). Sorry for incorrect answer. – avacariu Aug 14 '10 at 00:00
  • -1 Missed the point -- the problem is that `float` has limited accuracy, not that its representation is flawed. And `repr( )` is a string. `Decimal` is definitely needed if you want to store the number in a Python datatype. – Katriel Aug 14 '10 at 00:08
  • I say not: "However, I need to get this value into a string later on in the code." And even using `Decimal` you'd still need to convert to a string. So I say my answer doesn't miss the point. OP needed to convert to a string without the rounding error. – avacariu Aug 14 '10 at 00:21
0

As has already been said, a float isn't precise at all - so preserving precision can be somewhat misleading.

Here's a way to get every last bit of information out of a float object:

>>> from decimal import Decimal
>>> str(Decimal.from_float(0.1))
'0.1000000000000000055511151231257827021181583404541015625'

Another way would be like so.

>>> 0.1.hex()
'0x1.999999999999ap-4'

Both strings represent the exact contents of the float. Allmost anything else interprets the float as python thinks it was probably intended (which most of the time is correct).

Stefano Palazzo
  • 4,212
  • 2
  • 29
  • 40