2

There is a dataframe (df1) like as following after I read the data from txt file:

 name   l1     l2
  a    00000  00000 
  b    00010  00002
  c    00000  01218

When I use the python code as following:

dataframe.to_csv('test.csv', index= False)

Then I use the following code to read:

  df = pd.read_csv('test.csv')

I found the dataframe is being df2 as following

       name   l1      l2
        a      0       0
        b     10       2
        c      0      1218

But I want to keep the leading zero in the dataframe like df1.

Thanks!

jakevdp
  • 77,104
  • 11
  • 125
  • 160
tktktk0711
  • 1,656
  • 7
  • 32
  • 59

1 Answers1

7

The leading zeros are being removed because Pandas is implicitly converting the values to integer types. You want to read the data as string types, which you can do by specifying dtype=str:

pd.read_csv('test.csv', dtype=str)

Update as it helps others:

To have most or selective columns as str, one can do this:

# lst of column names which needs to be string
lst_str_cols = ['prefix', 'serial']
# use dictionary comprehension to make dict of dtypes
dict_dtypes = {x : 'str'  for x in lst_str_cols}
# use dict on dtypes
pd.read_csv('sample.csv', dtype=dict_dtypes)
ihightower
  • 3,093
  • 6
  • 34
  • 49
jakevdp
  • 77,104
  • 11
  • 125
  • 160
  • 3
    How can I read only one column as string datatype and rest can be auto identified by Pandas? In my case, only customer Id column has leading zeroes. – rAmAnA Mar 05 '19 at 18:13
  • 1
    The dtype argument can specify a mapping of column name to dtype. See the [read_csv documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) for details. – jakevdp Mar 05 '19 at 18:43