Avoid converting data to int automatically while reading using pandas data frame

Question

I have a csv file with no headers. It has around 35 columns.

I am reading this file using pandas. Currently, issue is that when it reads the file, it automatically assigns datatype to each columns.

How to avoid assigning automatic data types?

I have a column C, which I want to store as string instead of int. But pandas automatically assigns it to int

I tried 2 things.

1)

my_df = pd.DataFrame()
my_df = pd.read_csv('my_csv_file.csv',names=['A','B','C'...'Z'],converters={'C':str},engine = 'python')

Above code gives me error

ValueError: Expected 37 fields in line 1, saw 35

If I remove, converters={'C':str},engine = 'python' there is no error

2)

old_df['C'] = old_df['C'].astype(int)

Issue with this approach is that, if the value in column is '00123', it has already been converted to 123 and then it converts it to '123'. It would lose initial Zeroes , because it thinks it is integer.

http://stackoverflow.com/questions/12101113/prevent-pandas-from-automatically-infering-type-in-read-csv — Hakim, Mar 04 '16 at 07:24

score 3 · Answer 1 · answered Dec 07 '18 at 22:56

use dtype option or converters in read_csv read_csv doc, works regardless of using python engine or not:

df = pd.DataFrame({'col1':['00123','00125'],'col2':[1,2],'col3':[1.0,2.0]})
df.to_csv('test.csv',index=False)
new_df = pd.read_csv('test.csv',dtype={'col1':str,'col2':np.int64,'col3':np.float64})

If you simply use dtype=str then it will read every column in as a string (object). But you can not do that with converters as it expects a dictionary. You could substitute converters for dtype in above code and get same result.

Avoid converting data to int automatically while reading using pandas data frame

1 Answers1