How do I convert XML to CSV with some missing value in XML file?

Question

I have a simple XML data as shown below,

<LocationList>
  <Location dateTime="2018-11-17T00:11:01+09:00" x="2711.208" y="566.3292" z="0" motion="Walk" isMoving="True" stepCount="1" groupAreaId="1" commit="True" />
  <Location dateTime="2018-11-17T00:11:02+09:00" x="2640.506" y="518.7352" z="0" motion="Walk" isMoving="True" stepCount="1" groupAreaId="1" commit="True" />
  <Location dateTime="2018-11-17T00:11:03+09:00" x="2640.506" y="518.7352" z="0" motion="Stop" isMoving="False" stepCount="0" groupAreaId="1" />
  <Location dateTime="2018-11-17T00:52:31+09:00" x="2516.404" y="574.0547" z="0" motion="Walk" isMoving="True" stepCount="1" groupAreaId="1" />

and I have tried this to parse the XML into csv file,

import xml.etree.ElementTree as et
import csv

tree = et.parse('./1_2018-11-17.xml')
nodes = tree.getroot()
with open('testxml1.csv', 'w') as ff:
    cols = ['dateTime','x','y','z','motion','isMoving','stepCount',
            'groupAreaId','commit']
    nodewriter = csv.writer(ff)
    nodewriter.writerow(cols)
    for node in nodes:
        values = [ node.attrib[kk] for kk in cols]
        nodewriter.writerow(values)

However since not all XML lines has the value of 'stepCount', 'groupAreaId', 'commit', the code won't work unless I delete those variables.

How would I be able to get all variables shown in the csv file, including lines with empty value on the variables?

score 2 · Accepted Answer · answered Jul 04 '19 at 15:26

2

If you use the .get() method to read the node attribute it is possible to add a default value, like an empty string. So in your case it will be like this:

for node in nodes:
        values = [ node.attrib.get(kk, '') for kk in cols]
        nodewriter.writerow(values)

answered Jul 04 '19 at 15:26

Jesper

93
1
5

also, I know it is should be asked in different question but how do I get several csv output from several xml? I tried glob but did not work well.. – npm Jul 05 '19 at 03:45
1

@npm It should be asked in another question but if you want to loop through files, you could check out this: https://stackoverflow.com/questions/10377998/how-can-i-iterate-over-files-in-a-given-directory – Jesper Jul 05 '19 at 06:35

score 1 · Answer 2 · answered Jul 04 '19 at 15:25

You can use an if-else statement in the list comprehension to check if the attribute exists.

import xml.etree.ElementTree as et
import csv

tree = et.parse('./1_2018-11-17.xml')
nodes = tree.getroot()
with open('testxml1.csv', 'w') as ff:
    cols = ['dateTime', 'x', 'y', 'z', 'motion', 'isMoving', 'stepCount', 'groupAreaId', 'commit']
    nodewriter = csv.writer(ff)
    nodewriter.writerow(cols)
    for node in nodes:
        # if kk is not an attribute, set the value to None
        values = [node.attrib[kk] if kk in node.attrib else None for kk in cols]
        # Replace commit value with false if it does not exist
        if values[-1] is None:
            values[-1] = False
        nodewriter.writerow(values)

How do I convert XML to CSV with some missing value in XML file?

2 Answers2