Pandas is messing with a high resolution integer on read_csv

Question

EDIT: This was Excel's fault changing the data type, not Pandas.

When I read a CSV using pd.read_csv(file) a column of super long ints gets converted to a low res float. These ints are a date time in microseconds.

example: CSV Columns of some values:

15555071095204000
15555071695202000
15555072295218000
15555072895216000
15555073495207000
15555074095206000
15555074695212000
15555075295202000
15555075895210000
15555076495216000
15555077095230000
15555077695206000
15555078295212000
15555078895218000
15555079495209000
15555080095208000
15555080530515000
15555086531880000
15555092531889000
15555098531886000
15555104531886000
15555110531890000
15555116531876000
15555122531873000
15555128531884000
15555134531884000
15555140531887000
15555146531874000

pd.read_csv produces: 1.55551e+16

how do I get it to report the exact int?

I've tried using: float_precision='high'

score 3 · Answer 1 · answered Apr 17 '19 at 14:10

3

It's possible that this is caused by the way Pandas handles missing values, meaning that your column is importing as floats, to allow the missing values to be coded as NaN.

A simple solution would be to force the column to import as a str, then impute or remove missing values, and the convert to int:

import pandas as pd
df = pd.read_csv(file, dtypes={'col1': str}) # Edit to use appropriate column reference
# If you want to just remove rows with missing values, something like:
df = df[df.col1 != '']
# Then convert to integer
df.col1 = df.col1.astype('int64')

With a Minimal, Complete and Verifiable Example we can pinpoint the problem and update the code to accurately solve it.

answered Apr 17 '19 at 14:10

ajrwhite

7,728
1
11
24

Hi arjwhite! Thank you for attempting to answer an impossible question. I realized this is not pandas fault. It's Excels. Upon opening and saving this csv file with Excel the problem occurs. Pandas is being well behaved. – Adam Schulz Apr 17 '19 at 14:45
1

Yes, that was the other big possibility, but wasn't quite consistent with the way the question was framed. Generally you should avoid playing around with CSVs in both Excel and Pandas, as Excel is unpredictable with ambiguous data types. You may want to look at the `pd.read_excel` function if you have to continue using Excel. Also look at saving Pandas CSVs with quotation marks, to preserve formatting - `import csv`, `df.to_csv(... , quoting=csv.QUOTE_ALL)` – ajrwhite Apr 17 '19 at 16:28

Pandas is messing with a high resolution integer on read_csv

1 Answers1