I have a problem with reading in a csv with an id field with mixed dtypes from the original source data, i.e. the id field can be 11, 2R399004, BL327838, 7 etc. but the vast majority of them being 8 characters long.
When I read it with multiple versions of pd.read_csv and encoding='iso-8859-1' it always converts the 7 and 11 to 00000007 or the like. I've tried using utf-8 but I get the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc9 in position 40: unexpected end of data
I have tried setting the dtype={'field': object} and string and various iterations of latin-1 and the like but it will continually do this.
Is there any way to get around this error, without going through every individual file and fixing the dtypes?