Is it possible to read categorical columns with pandas' read_csv?

Question

I have tried passing the dtype parameter with read_csv as dtype={n: pandas.Categorical} but this does not work properly (the result is an Object). The manual is unclear.

Is n a string in your code snippet (it should be probably). I'll suggest using the astype method on the individual columns otherwise. — wegry, May 16 '15 at 06:35
This is not possible at the moment (and passing `pd.Categorical` will not work in any case, as this is not a dtype). But you can open an enhancement request at https://github.com/pydata/pandas/issues — joris, May 16 '15 at 09:47
pandas 21.0 has a [CategoricalDtype](https://pandas.pydata.org/pandas-docs/version/0.21/whatsnew.html#whatsnew-0210-enhancements-categorical-dtype); the example `read_csv(...)` there does what you want. — denis, Nov 05 '17 at 16:22

jezrael · Accepted Answer · 2018-12-24T07:31:54.017

65

In version 0.19.0 you can use parameter dtype='category' in read_csv:

data = 'col1,col2,col3\na,b,1\na,b,2\nc,d,3'
df = pd.read_csv(pd.compat.StringIO(data), dtype='category')
print (df)
  col1 col2 col3
0    a    b    1
1    a    b    2
2    c    d    3

print (df.dtypes)
col1    category
col2    category
col3    category
dtype: object

If want specify column for category use dtype with dictionary:

df = pd.read_csv(pd.compat.StringIO(data), dtype={'col1':'category'})
print (df)
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

print (df.dtypes)
col1    category
col2      object
col3       int64
dtype: object

edited Dec 24 '18 at 07:31

answered Oct 03 '16 at 11:24

jezrael

822,522
95
1,334
1,252

5

I think yes, use `df = pd.read_csv(StringIO(data), dtype={'col1':'category'}, index_col='col1')` – jezrael Feb 01 '17 at 07:53
1

This just made my day. – Relaxed1 Dec 15 '18 at 15:51

Is it possible to read categorical columns with pandas' read_csv?

1 Answers1