2

I am trying to read a csv file with pandas that has some rows in scientific notation.

When it reads the values it is not capturing the true underlying number. When I re-purpose the data the true value gets lost.

df = pd.read_csv('0_IDI_Submitter_out.csv')

The underlying true values that I am trying to preserve are as follows:

      INPUT: Extra 1
0     8921107
1     56300839420000
2     56207557000000

However, pandas reads it as

 INPUT: Extra 1
0     8921107
1     5.63008E+13
2     5.62076E+13

If I try to write a new csv or use this data the values show as:

 INPUT: Extra 1
0     8921107
1     56300800000000
2     56207600000000

How can I get pandas to read the true number rather than the scientific notation which causes it to convert incorrectly?

David Seroy
  • 179
  • 1
  • 14
  • 'how to get rid of pandas converting large numbers in excel sheet to exponential..' https://stackoverflow.com/questions/38689125/how-to-get-rid-of-pandas-converting-large-numbers-in-excel-sheet-to-exponential – brokenfoot Sep 23 '19 at 17:05
  • 1
    @brokenfoot I believe that is just for display purposes which doesn't address my issue. In my case the value is literally being altered by pandas. – David Seroy Sep 23 '19 at 17:12
  • Cannot reproduce. The numbers that you show can be represented in `np.int64`, and pandas successfully reads them that way. Please show the **exact csv file as text**. If you have written it from Excel, the problem is probably there... – Serge Ballesta Sep 24 '19 at 06:25

2 Answers2

4

The problem appears to be that opening a CSV file in Excel which contains large numbers or strings that appear as large numbers like product codes, SKU's, UPC's etc are automatically converted into scientific notation. Once this has been done, you'll have to manually go into Excel and re-format but trying to do this from Pandas does not appear possible and the data integrity is lost.

However, if I never open the file in Excel and work on it purely through Pandas then it is all good. Similarly, if you work purely in Excel you're also good.

My ultimate conclusion is that when working with large numbers or strings that appear as large numbers like product codes or UPC's it is best not to mix pandas with Excel. As an alternative, I just started saving all my dataframes as pickle files instead of csv.

Hope that helps anyone in the future.

Thanks

David Seroy
  • 179
  • 1
  • 14
  • Additionally, to check how Excel is saving your values, open your CSV file in a text editor like Notepad, and check what's written on there, because this is exactly what pandas will read. You might notice that scientific notation numbers in the CSV do not contain the complete information for the number. – PJ_ May 02 '22 at 20:16
2

Can't seem to reproduce your problem, but maybe this will work?

df = pd.read_csv('0_IDI_Submitter_out.csv', dtype={'INPUT: Extra 1':np.object_})

Also, check the dtypes of your dataframe:

result = df.dtypes
print(result)
neutrino_logic
  • 1,289
  • 1
  • 6
  • 11