1

Take this test CSV file:

COLUMN1;COLUMN2;COLUMN3;COLUMN4;COLUMN5;COLUMN6;COLUMN7
CODE;1234;0123456789;0987654321;012345678987654321;012345;10110025

I want to convert this file to XML. To do it, I am using the code in this Stackoverflow answer. The complete test code is this:

import csv   
import pandas as pd
df = pd.read_csv('test.csv', sep=';')

def convert_row(row):
    return """<root>
    <column1>%s</column1>
    <column2>%s</column2>
    <column3>%s</column3>
    <column4>%s</column4>
    <column5>%s</column5>
    <column6>%s</column6>
    <column7>%s</column7>   
</root>""" % (
    row.COLUMN1, row.COLUMN2, row.COLUMN3, row.COLUMN4, row.COLUMN5, row.COLUMN6, row.COLUMN7)

print '\n'.join(df.apply(convert_row, axis=1))

However, every column value starting with a zero gets stripped of the leading zero character. This is the output:

<root>
    <column1>CODE</column1>
    <column2>1234</column2>
    <column3>123456789</column3>
    <column4>987654321</column4>
    <column5>12345678987654321</column5>
    <column6>12345</column6>
    <column7>10110025</column7> 
</root>

I thought using %s would keep the original string intact without modifying it in any way, is this not the case?

How can I make sure that the XML output receives exactly the same value in the CSV file?

user1301428
  • 1,743
  • 3
  • 25
  • 57

1 Answers1

2

The problem doesn't lie with the string formatting, but with the CSV import. Pandas converts your data to int64's when importing.

Try df = pd.read_csv('test.csv', sep=';', dtype='str') to avoid this.

Hope this helps!

Bart Van Loon
  • 1,430
  • 8
  • 18