2

I am trying to dump the data in excel (xlsx) file into a text file using xlrd module and having problems with float precessions.

Have found a couple of post with similar issue where precession was lost some where in the last of 16 decimal digits .

Below is the data copied from xlsx:

VALUE;DATA
1.01    HELLO
2.11    1/1/2014
3.21    ONE
4.31    1/1/2014 12:14
5.441   $10 
6.241   TWO
77.11   Zulfi
8.11    99
9.11    99.999
10.11   0

Below is what I get from xlrd:

 1.01|'HELLO'
2.1099999999999999|'2014-01-01 00:00:00.000000'
3.21|'ONE'
4.3099999999999996|'2014-01-01 12:14:00.000000'
5.4409999999999998|10.0
6.2409999999999997|'TWO'
77.109999999999999|'Zulfi'
8.1099999999999994|99.0
9.1099999999999994|99.998999999999995
10.109999999999999|0.0
  1. "2.11" has become "2.1099999999999999" and "4.31" has become "4.3099999999999996" etc...
  2. "$10" has become "10"
  3. Date values look little different (this one I understand and I get what I coded)

I do understand some mysteries of floats where there is a possibility of loosing precession for values which have too many digits post the decimal but here I have only two digits.

I am testing the output by comparing with that of opensource ETL tool "Pentaho" (written in java) which can read/write excel files and this tool seems to have no problem reading the values as they appear in the xlsx file (the fields were being read as both strings and numbers with length 30 and precession 20)

Here is what Pentaho reads

VALUE;DATA
 1.01;HELLO
 2.11;2014/01/01 00:00:00.000
 3.21;ONE
 4.31;2014/01/01 12:14:00.000
 5.441; 10.0
 6.241;TWO
 77.11;Zulfi
 8.11; 99.0
 9.11; 99.999
 10.11; 0.0

and below is my python code :

for rownum in xrange(sh.nrows):
    for colnum in xrange(sh.ncols):
        cell_obj = sh.cell(rownum,colnum)
        cell_val=sh.cell_value(rownum,colnum)
        if cell_obj.ctype == xlrd.XL_CELL_DATE:
            year, month, day, hour, minute, second = xlrd.xldate_as_tuple(cell_val, wb.datemode)
            py_date = datetime.datetime(year, month, day, hour, minute, second).strftime("%Y-%m-%d %H:%M:%S.%f")
            cell_val = py_date
        if (colnum==0):
            row_values=repr(cell_val)
        else :
            row_values=row_values+fdel+repr(cell_val)
    if (((row_values).find("\\n"))>-1):
        NLFlag=1
    file_output.write((row_values).replace('\\n','') + "\n")
    row_values=''
file_output.close()

Any help on this is highly appreciated.

Thanks

zulfi123786
  • 165
  • 1
  • 2
  • 13

1 Answers1

0

Floats, aka double precision reals, only have about 16 decimal digits of precision. They can only represent decimal fractions exactly when the fraction is of the form 2^-n, so multiples of 1/8th, or 1/1024th. All other decimals may be inexact.

If you print out a float with no further instructions, you will get the system's best effort at representing the binary fraction in decimal, so 2.099999999999 etc instead of 2.1. If however, you know that your floats do not represent more than (say) 3 decimal digits, then you can force them to be rounded on output, for instance by using the string format function

file_output.write('{0:.3f}'.format(f_num))

will output f_num correctly rounded down and padded up with zeroes to 3 places

I would guess that by default, the xlrd library interprets any field that can be as a float. There should be a switch to force all the reads to be as strings, in which case what you print out will be exactly what you read.

Neil_UK
  • 1,043
  • 12
  • 25
  • The `xlrd` library isn't doing any interpretation here: the Excel file contains a (binary) float, and `xlrd` reads that binary float exactly. The confusion arises from the fact that Excel and Python (looks like Python 2.6, by the way) choose to display that float in different ways: Excel hides the imprecision, while Python shows (some) of it. – Mark Dickinson Jun 23 '14 at 15:27
  • So basically Python is printing out 16 digits of precision, while Excel works to 15, then lops off the trailing 0s, and prints what's left. This is arguably the more user friendly thing to do in this circumstance. Like if you've entered numbers in a spreadsheet, they will have a handful of digits. If you are worried about the difference between 15 and 16 digits, then you are a more sophisticated user, and will make it your business to know how to get them. – Neil_UK Jun 23 '14 at 20:41
  • Yep, pretty much (though Python's output is based on 17 significant digits, not 16). The key point is that both Excel and Python are storing the exact same numeric value; they differ only in how they choose to display that value. (There are arguments to be had about the relative merits of the two display methods, but I'm not sure this is the place for them.) – Mark Dickinson Jun 24 '14 at 07:21
  • Thanks for the reply, In this case I have a column with dollar amounts with different digits of precession. Had they been of the same precession I can round them off. – zulfi123786 Jun 24 '14 at 10:40
  • Tried to check if the "Format" Class for the cell contains how many digits of precession the number in a specific cell has to round it to that precession but the best I could find was "General" as the cell format. The below Perl code surprisingly dumps the excel contetns as they are displayed/entered into the excel, Any Idea how this is made possible ? Does each cell store the entered numbers length and precession which Perl is fetching before dumping the cell contents. – zulfi123786 Jun 24 '14 at 11:24
  • `use strict; use warnings; use Win32::OLE qw(in with); use Win32::OLE::Const; use Win32::OLE::Const 'Microsoft Excel'; my $filename = $ARGV[0]; &Convert_Excel_CSV($filename); #****************** Converting Excel File to CSV Format ********************* sub Convert_Excel_CSV() { my $ExcelFile=shift; my $CsvFile = $ExcelFile; $CsvFile=~s/\.xlsx?/\.csv/; my $Excel = (Win32::OLE->GetActiveObject('Excel.Application') || Win32::OLE->new('Excel.Application')); $Excel->{'Visible'} = 0; # if you want to see what's going on` – zulfi123786 Jun 24 '14 at 11:26
  • `$Excel->Workbooks->Open("$ExcelFile")|| die("Unable to open document ", Win32::OLE->LastError()); unlink "$CsvFile" if (-e "$CsvFile"); $Excel->{DisplayAlerts} = "False"; $Excel->ActiveWorkbook->SaveAs ({ FileName => "$CsvFile", FileFormat => xlCSV, }); if(Win32::OLE->LastError()==0) { print "\n\t******Saving File As CSV:\t$CsvFile******\n"; } else { die("Unable to save document ", Win32::OLE->LastError()); } $Excel->ActiveWorkbook->Close(); $Excel->Quit; }` – zulfi123786 Jun 24 '14 at 11:26
  • Perl Output (Windows) 1.01,HELLO 2.11,1/1/2014 3.21,10:09:30.000 4.31,1/1/2014 12:14 5.441,$10 6.241,12:10:00 AM 77.11,Zulfi 8.11,99 9.11,99.999 10.11,0 – zulfi123786 Jun 24 '14 at 11:28