I have Pandas v0.24+, and I'm looking through: Keeping array type as integer while having a NaN value
I'm getting the usual value errors by trying to read in Integer columns with nan values.
Pandas: ValueError: Integer column has NA values in column 33
This is because integer types cannot handle NA values. The problem is I don't actually know the datatypes of my csv - I'd still like pandas to 'infer' what they are. Is there a way it can do this while using Int64
by default instead of int64
, so that it doesn't halt and complain about NA values in the process?
EDIT: This is what happens
df = pd.read_csv(file)
Then
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/christopherturnbull/DATA_SCIENCE/PointTopic/access_test_v3.py", line 18, in <module>
df = mdb.read_table(rdb_file,'v31a_v8_oct20_point_topic_availability_deliverable_201118')
File "/Users/christopherturnbull/DATA_SCIENCE/virtualenvs/pointtopic/lib/python3.8/site-packages/pandas_access/__init__.py", line 127, in read_table
return pd.read_csv(proc.stdout, *args, **kwargs)
File "/Users/christopherturnbull/DATA_SCIENCE/virtualenvs/pointtopic/lib/python3.8/site-packages/pandas/io/parsers.py", line 688, in read_csv
return _read(filepath_or_buffer, kwds)
File "/Users/christopherturnbull/DATA_SCIENCE/virtualenvs/pointtopic/lib/python3.8/site-packages/pandas/io/parsers.py", line 460, in _read
data = parser.read(nrows)
File "/Users/christopherturnbull/DATA_SCIENCE/virtualenvs/pointtopic/lib/python3.8/site-packages/pandas/io/parsers.py", line 1198, in read
ret = self._engine.read(nrows)
File "/Users/christopherturnbull/DATA_SCIENCE/virtualenvs/pointtopic/lib/python3.8/site-packages/pandas/io/parsers.py", line 2157, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 847, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 862, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 941, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 1073, in pandas._libs.parsers.TextReader._convert_column_data
File "pandas/_libs/parsers.pyx", line 1104, in pandas._libs.parsers.TextReader._convert_tokens
File "pandas/_libs/parsers.pyx", line 1198, in pandas._libs.parsers.TextReader._convert_with_dtype
ValueError: Integer column has NA values in column 33
But df = pd.read_csv(file, header = None)
seems to work, although now I don't have the dtypes