Pandas automatically removes zeros from columns when reading the CSV file

Question

I am using the following code to read the CSV file, replace nulls and write the result to another file:

with open('result.csv', 'a') as f:
    for chunk in pd.read_csv('personaldata.csv', chunksize=10000):
        chunk.fillna('N/A', inplace=True)
        chunk.to_csv(f, sep=',', index=False)

This code works fine and replaces the null values with N/A but the problem is that all the columns that have zeros (0) at the beginning have those values removed.

My CSV file looks like this:

id,CompanyNumber,oldAppointmentType
1, 00000006,     "Current Secretary"
5, 00000058,     "Current Secretary"

When I write the result in the new file, the column CompanyNumber looks like this:

CompanyNumber
       6
      58

I thought it might be because I am using chunksize so, I separated the file in to several files and tried without chunksize but the result was the same. Also I tried printing it like this print(chunk.head()) but the zeros were still missing. This might be happening to the other columns that have zeros (I haven't checked because I have about 40 columns)

Does anyone know what is causing this behavior?

If need convert all columns to `str`, add parameter `dtype=str` like `pd.read_csv('personaldata.csv', chunksize=10000, dtype=str)` — jezrael, Jan 25 '17 at 09:09
You need to specify the `dtype` for that column to be `str`, pass `dtype={'CompanyNumber':str}` or `dtype={'CompanyNumber':object}` — EdChum, Jan 25 '17 at 09:09
I was searching for the answer but I didn't find it, that is why I posted the question, but yes, it seems like a duplicate. BTW thank you, the answers solved the problem — Porjaz, Jan 25 '17 at 09:13

Pandas automatically removes zeros from columns when reading the CSV file

0 Answers0