I have tried passing the dtype
parameter with read_csv
as dtype={n: pandas.Categorical}
but this does not work properly (the result is an Object). The manual is unclear.
Asked
Active
Viewed 2.1k times
50
-
1Is one column categorical or are they all? – wegry May 16 '15 at 06:08
-
1One or more, but not all. – Emre May 16 '15 at 06:24
-
Is n a string in your code snippet (it should be probably). I'll suggest using the astype method on the individual columns otherwise. – wegry May 16 '15 at 06:35
-
This is not possible at the moment (and passing `pd.Categorical` will not work in any case, as this is not a dtype). But you can open an enhancement request at https://github.com/pydata/pandas/issues – joris May 16 '15 at 09:47
-
2pandas 21.0 has a [CategoricalDtype](https://pandas.pydata.org/pandas-docs/version/0.21/whatsnew.html#whatsnew-0210-enhancements-categorical-dtype); the example `read_csv(...)` there does what you want. – denis Nov 05 '17 at 16:22
1 Answers
65
In version 0.19.0
you can use parameter dtype='category'
in read_csv
:
data = 'col1,col2,col3\na,b,1\na,b,2\nc,d,3'
df = pd.read_csv(pd.compat.StringIO(data), dtype='category')
print (df)
col1 col2 col3
0 a b 1
1 a b 2
2 c d 3
print (df.dtypes)
col1 category
col2 category
col3 category
dtype: object
If want specify column for category use dtype
with dictionary:
df = pd.read_csv(pd.compat.StringIO(data), dtype={'col1':'category'})
print (df)
col1 col2 col3
0 a b 1
1 a b 2
2 c d 3
print (df.dtypes)
col1 category
col2 object
col3 int64
dtype: object

jezrael
- 822,522
- 95
- 1,334
- 1,252
-
5I think yes, use `df = pd.read_csv(StringIO(data), dtype={'col1':'category'}, index_col='col1')` – jezrael Feb 01 '17 at 07:53
-
1