50

I have tried passing the dtype parameter with read_csv as dtype={n: pandas.Categorical} but this does not work properly (the result is an Object). The manual is unclear.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
Emre
  • 5,976
  • 7
  • 29
  • 42
  • 1
    Is one column categorical or are they all? – wegry May 16 '15 at 06:08
  • 1
    One or more, but not all. – Emre May 16 '15 at 06:24
  • Is n a string in your code snippet (it should be probably). I'll suggest using the astype method on the individual columns otherwise. – wegry May 16 '15 at 06:35
  • This is not possible at the moment (and passing `pd.Categorical` will not work in any case, as this is not a dtype). But you can open an enhancement request at https://github.com/pydata/pandas/issues – joris May 16 '15 at 09:47
  • 2
    pandas 21.0 has a [CategoricalDtype](https://pandas.pydata.org/pandas-docs/version/0.21/whatsnew.html#whatsnew-0210-enhancements-categorical-dtype); the example `read_csv(...)` there does what you want. – denis Nov 05 '17 at 16:22

1 Answers1

65

In version 0.19.0 you can use parameter dtype='category' in read_csv:

data = 'col1,col2,col3\na,b,1\na,b,2\nc,d,3'
df = pd.read_csv(pd.compat.StringIO(data), dtype='category')
print (df)
  col1 col2 col3
0    a    b    1
1    a    b    2
2    c    d    3

print (df.dtypes)
col1    category
col2    category
col3    category
dtype: object

If want specify column for category use dtype with dictionary:

df = pd.read_csv(pd.compat.StringIO(data), dtype={'col1':'category'})
print (df)
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

print (df.dtypes)
col1    category
col2      object
col3       int64
dtype: object
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252