0

Large integer values imported from an excel document are being represented as scientific notation when I use the pd.read_excel() function.

I have attempted all of the solutions found here, here, here, here, and here to no avail.

When I export the file as a CSV from within excel, it does not round the numbers, but I have too many Excel documents to do this manually.

This is an image of the excel value:

enter image description here

This is an image of the same column within pandas using pd.read_excel('filename.xlsx'):

enter image description here

Here are the sample numbers used:

6265457510690330000
6669887892027880000

Any input would be greatly appreciated.

NYezhov
  • 45
  • 5
  • Excel doesn't have integer values. All Excel numbers are floats. `pandas` knows this. – BoarGules May 20 '22 at 23:22
  • What's the problem? You're only seeing the formatted representation in Pandas. – Charlie Clark May 23 '22 at 11:21
  • @BoarGules, this is an interesting observation. The value I'm interested is a long numeric ID value that I presume Excel is interpreting as a float and converting to scientific notation. This is a problem because anything other than the exact integer value does not represent the correct ID. – NYezhov May 23 '22 at 13:28
  • @CharlieClark, the issue is that this ID number is being rounded. In the Excel document the column does not appear rounded, but when imported it the excel document into pandas the value is rounded down (In a way that does not make a ton of sense given the location of the decimal place). I unzipped the Excel document in 7zip and found the column values in the xml document are saved in scientific notation, which I think is part of the issue. If Pandas is reading the document in earnest then the document is the issue, not pandas. – NYezhov May 23 '22 at 13:30
  • 1
    Excel promises accuracy only to 15 significant digits. It needs the other 2 to conceal the magic that leads you to suppose they aren't floats. If you need more digits than that you will need to store the id numbers in Excel as strings: `'01234567890987654321`. – BoarGules May 23 '22 at 13:32
  • @BoarGules. Brutal. So here is a dumb solution. When I use ```pd.read_excel('file_name.xlsx', dtype={'_id': str})``` I can get the scientific notation string before its been reduced. I'm thinking of just breaking out the number values from the string, dropping the e+18 and using zfill to convert the ID back to what I need it to be. – NYezhov May 23 '22 at 13:49
  • 1
    The problem is with Excel. The data format is limited to 15 digits and even if you add more, Excel will remove precision. You **must** use strings to avoid this. – Charlie Clark May 23 '22 at 16:06

1 Answers1

0

Pandas has display options. I use these in one of my scripts. They should provide a useful template for you.

# set display options for table
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
pd.set_option('precision', 4) 

If it is floats try this:

pd.set_option('display.float_format', lambda x: '%16.2f' % x)
John
  • 435
  • 6
  • 15