Pandas read csv adds zeros

Question

I have a problem with reading in a csv with an id field with mixed dtypes from the original source data, i.e. the id field can be 11, 2R399004, BL327838, 7 etc. but the vast majority of them being 8 characters long.

When I read it with multiple versions of pd.read_csv and encoding='iso-8859-1' it always converts the 7 and 11 to 00000007 or the like. I've tried using utf-8 but I get the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc9 in position 40: unexpected end of data

I have tried setting the dtype={'field': object} and string and various iterations of latin-1 and the like but it will continually do this.

Is there any way to get around this error, without going through every individual file and fixing the dtypes?

What is the encoding of the files? What are the dtypes of the field, if you import it with `'iso-8859-1'`? When you say "multiple versions of pd.read_csv," what do you mean? — Evan, Feb 07 '18 at 18:25
Have you tried this? https://stackoverflow.com/questions/13142347/how-to-remove-leading-and-trailing-zeros-in-a-string-python — Evan, Feb 07 '18 at 18:30

score 0 · Answer 1 · answered Feb 08 '18 at 09:08

0

Basically the column looks like this

Column_ID 10 HGF6558 059 KP257 0001

answered Feb 08 '18 at 09:08

Sudheer Pamula

31
2
3

Pandas read csv adds zeros

1 Answers1