Import pandas dataframe column as string not int

Question

I would like to import the following csv as strings not as int64. Pandas read_csv automatically converts it to int64, but I need this column as string.

ID
00013007854817840016671868
00013007854817840016749251
00013007854817840016754630
00013007854817840016781876
00013007854817840017028824
00013007854817840017963235
00013007854817840018860166

df = read_csv('sample.csv')

df.ID
>>

0   -9223372036854775808
1   -9223372036854775808
2   -9223372036854775808
3   -9223372036854775808
4   -9223372036854775808
5   -9223372036854775808
6   -9223372036854775808
Name: ID

Unfortunately using converters gives the same result.

df = read_csv('sample.csv', converters={'ID': str})
df.ID
>>

0   -9223372036854775808
1   -9223372036854775808
2   -9223372036854775808
3   -9223372036854775808
4   -9223372036854775808
5   -9223372036854775808
6   -9223372036854775808
Name: ID

score 222 · Accepted Answer · edited Dec 02 '20 at 04:12

222

Just want to reiterate this will work in pandas >= 0.9.1:

In [2]: read_csv('sample.csv', dtype={'ID': object})
Out[2]: 
                           ID
0  00013007854817840016671868
1  00013007854817840016749251
2  00013007854817840016754630
3  00013007854817840016781876
4  00013007854817840017028824
5  00013007854817840017963235
6  00013007854817840018860166

I'm creating an issue about detecting integer overflows also.

EDIT: See resolution here: https://github.com/pydata/pandas/issues/2247

Update as it helps others:

To have all columns as str, one can do this (from the comment):

pd.read_csv('sample.csv', dtype = str)

To have most or selective columns as str, one can do this:

# lst of column names which needs to be string
lst_str_cols = ['prefix', 'serial']
# use dictionary comprehension to make dict of dtypes
dict_dtypes = {x : 'str'  for x in lst_str_cols}
# use dict on dtypes
pd.read_csv('sample.csv', dtype=dict_dtypes)

edited Dec 02 '20 at 04:12

ihightower

3,093
6
34
49

answered Nov 14 '12 at 17:58

Wes McKinney

101,437
32
142
108

19

It also seems, if you want all columns to be interpreted as strings, one can do the following: `dtype = str`. – steveb Jul 06 '17 at 18:09
1

It seems empty fields still come through as np.nan – Josiah Yoder Sep 19 '19 at 22:03
2

same question here. But i used keep_default_na = False resolved my issue. – jtcloud Feb 10 '20 at 15:00
1

Thank you for the comments. I also had to use dypte=str AND keep_default_na = False so that null values weren't nan. – Ross117 Jul 23 '20 at 18:22
Using the high-digit integers as a string saves a lot of headaches. ; hero or villain? YOU'RE A HERO!! – Soy César Mora May 24 '22 at 20:21

spencerlyon2 · Answer 2 · 2012-11-10T01:06:15.310

This probably isn't the most elegant way to do it, but it gets the job done.

In[1]: import numpy as np

In[2]: import pandas as pd

In[3]: df = pd.DataFrame(np.genfromtxt('/Users/spencerlyon2/Desktop/test.csv', dtype=str)[1:], columns=['ID'])

In[4]: df
Out[4]: 
                       ID
0  00013007854817840016671868
1  00013007854817840016749251
2  00013007854817840016754630
3  00013007854817840016781876
4  00013007854817840017028824
5  00013007854817840017963235
6  00013007854817840018860166

Just replace '/Users/spencerlyon2/Desktop/test.csv' with the path to your file

score 19 · Answer 3 · edited Jun 03 '20 at 09:47

19

Since pandas 1.0 it became much more straightforward. This will read column 'ID' as dtype 'string':

pd.read_csv('sample.csv',dtype={'ID':'string'})

As we can see in this Getting started guide, 'string' dtype has been introduced (before strings were treated as dtype 'object').

edited Jun 03 '20 at 09:47

Itchydon

2,572
6
19
33

answered Apr 14 '20 at 03:03

denis_smyslov

741
8
8

Import pandas dataframe column as string not int

3 Answers3

Linked

Related