0

By default, pandas.read_csv() will read a string column using dtype object. Since pandas 1.0, it is possible to read this as a string dtype instead. I'm reading a CSV where most columns are strings. Can I tell pandas to (attempt to) read all non-numeric columns as strings by default rather than as object dtypes?

The code:

import pandas
import io

s = """2,e,4,w
3,f,5,x
4,g,6,z"""
df = pandas.read_csv(io.StringIO(s))
print(df.dtypes)
df = pandas.read_csv(
        io.StringIO(s),
        dtype=dict.fromkeys([1, 3], pandas.StringDtype()))
print(df.dtypes)

This results in:

2     int64
e    object
4     int64
w    object
dtype: object
2     int64
e    string
4     int64
w    string
dtype: object

I'm using pandas 1.0.0rc0. Reading all as string dtype directly should prevent problems with mixed types when writing an HDFStore.

Karthick Mohanraj
  • 1,565
  • 2
  • 13
  • 28
gerrit
  • 24,025
  • 17
  • 97
  • 170
  • Decided to start the [tag:pandas-1.0] tag as its release is imminent and its major changes are likely to trigger many specific questions. See [this question on meta](https://meta.stackoverflow.com/q/393206/974555). – gerrit Jan 22 '20 at 09:15

1 Answers1

3

This is not possible in pandas 1.0. There is currently (2020-01-22) an open issue on github and an open pull request adding this feature. The feature is currently targeted for pandas 1.1:

With the new dtypes (IntegerArray, StringArray, etc.), if you want to use them when reading in data, you have to specify the types for all of the columns. It would be nice to have the option to use the new dtypes for all columns as a keyword to read_csv(), read_excel(), etc.

The exact API is still to be decided, I will update this answer when it is.

For now, you will have to pass explicitly the names of all the columns that shall be read as strings.

gerrit
  • 24,025
  • 17
  • 97
  • 170
  • This is now possible in pandas 1.2 by calling the `convert_dtypes` method on your `DataFrame` after you have created it. [Link](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.convert_dtypes.html) – amin_nejad Oct 28 '21 at 18:19