Converting long integers to strings in pandas (to avoid scientific notation)

Question

I want the following records (currently displaying as 3.200000e+18 but actually (hopefully) each a different long integer), created using pd.read_excel(), to be interpreted differently:

ipdb> self.after['class_parent_ref']
class_id
3200000000000515954    3.200000e+18
3200000000000515951             NaN
3200000000000515952             NaN
3200000000000515953             NaN
3200000000000515955    3.200000e+18
3200000000000515956    3.200000e+18
Name: class_parent_ref, dtype: float64

Currently, they seem to 'come out' as scientifically notated strings:

ipdb> self.after['class_parent_ref'].iloc[0]
3.2000000000005161e+18

Worse, though, it's not clear to me that the number has been read correctly from my .xlsx file:

ipdb> self.after['class_parent_ref'].iloc[0] -3.2e+18
516096.0

The number in Excel (the data source) is 3200000000000515952.

This is not about the display, which I know I can change here. It's about keeping the underlying data in the same form it was in when read (so that if/when I write it back to Excel, it'll look the same and so that if I use the data, it'll look like it did in Excel and not Xe+Y). I would definitely accept a string if I could count on it being a string representation of the correct number.

You may notice that the number I want to see is in fact (incidentally) one of the labels. Pandas correctly read those in as strings (perhaps because Excel treated them as strings?) unlike this number which I entered. (Actually though, even when I enter ="3200000000000515952" into the cell in question before redoing the read, I get the same result described above.)

How can I get 3200000000000515952 out of the dataframe? I'm wondering if pandas has a limitation with long integers, but the only thing I've found on it is 1) a little dated, and 2) doesn't look like the same thing I'm facing.

Thank you!

The problem is that you have floats, not integers. And the number you have too big to have such a precision as a float. The reason you end up with floats is because of the `NaN` values (`NaN` is not supported in integer columns, therefore it is cast to floats). — joris, Oct 27 '14 at 20:20
Thanks, @joris. Using the keep_default_na=False kwarg of read_excel() seems to have solved the problem. Feel free to answer accordingly and I'll 'check' it. — HaPsantran, Oct 27 '14 at 23:03
@HaPsantran you might just want to provide your own answer as joris seems not to have noticed your suggestion. — JohnE, May 02 '19 at 14:06

score 2 · Accepted Answer · answered Aug 28 '19 at 17:33

Convert your column values with NaN into 0 then typcast that column as integer to do so.

df[['class_parent_ref']] = df[['class_parent_ref']].fillna(value = 0)
df['class_parent_ref'] = df['class_parent_ref'].astype(int)

Or in reading your file, specify keep_default_na = False for pd.read_excel() and na_filter = False for pd.read_csv()

Converting long integers to strings in pandas (to avoid scientific notation)

1 Answers1

Linked