By default, pandas.read_csv()
will read a string column using dtype object. Since pandas 1.0, it is possible to read this as a string dtype instead. I'm reading a CSV where most columns are strings. Can I tell pandas to (attempt to) read all non-numeric columns as strings by default rather than as object dtypes?
The code:
import pandas
import io
s = """2,e,4,w
3,f,5,x
4,g,6,z"""
df = pandas.read_csv(io.StringIO(s))
print(df.dtypes)
df = pandas.read_csv(
io.StringIO(s),
dtype=dict.fromkeys([1, 3], pandas.StringDtype()))
print(df.dtypes)
This results in:
2 int64
e object
4 int64
w object
dtype: object
2 int64
e string
4 int64
w string
dtype: object
I'm using pandas 1.0.0rc0. Reading all as string dtype directly should prevent problems with mixed types when writing an HDFStore.