0

I have this XML file

<Source>
    <segment1>
        <userRefNumber>test1</userRefNumber>
        <subscriber>
            <industryCode>ZZZZZ</industryCode>
            <memberCode>12345</memberCode>
            <inquirySubscriberPrefixCode>0622</inquirySubscriberPrefixCode>
        </subscriber>
        <options>
            <country>us</country>
            <language>en</language>
        </options>
        <tracking>
            <transactionTimeStamp>2021-02-25T04:09:30.508-06:00</transactionTimeStamp>
        </tracking>
    </segment1>
    <example2>
        <subscriber>
            <industryCode>ZAAAA</industryCode>
            <memberCode>999999</memberCode>
            <inquirySubscriberPrefixCode>0622</inquirySubscriberPrefixCode>
        </subscriber>
        <options>
            <country>us</country>
            <language>en</language>
        </options>
        <tracking>
            <transactionTimeStamp>2020-02-25T04:09:30.508-06:00</transactionTimeStamp>
        </tracking>
    </example2>
</Source>

I'd like to create, through Python, two XML files, that is one for each child:

  • one xml file for segment1
  • one xml file for segment2

xml1 should look like this:

<Source>
    <segment1>
        <userRefNumber>test1</userRefNumber>
        <subscriber>
            <industryCode>ZZZZZ</industryCode>
            <memberCode>12345</memberCode>
            <inquirySubscriberPrefixCode>0622</inquirySubscriberPrefixCode>
        </subscriber>
        <options>
            <country>us</country>
            <language>en</language>
        </options>
        <tracking>
            <transactionTimeStamp>2021-02-25T04:09:30.508-06:00</transactionTimeStamp>
        </tracking>
    </segment1>
</Source>

and xml2 should look like this:

<Source>
    <example2>
        <subscriber>
            <industryCode>ZAAAA</industryCode>
            <memberCode>999999</memberCode>
            <inquirySubscriberPrefixCode>0622</inquirySubscriberPrefixCode>
        </subscriber>
        <options>
            <country>us</country>
            <language>en</language>
        </options>
        <tracking>
            <transactionTimeStamp>2020-02-25T04:09:30.508-06:00</transactionTimeStamp>
        </tracking>
    </example2>
</Source>

The splitting criteria is this: to create one separate xml file for each element. In the example above there are two elements (segment1 and example2): therefore i'd like to create two xml files for each one of those.

I have checked this answer Split one large .xml file in more .xml files (python) but in that example the children have the same name so I guess the findall function doesn't apply to my case, as the children have different names (segment1 and segment2). Is it possible to create a single xml file based on the order of the elements from the root?

Giampaolo Levorato
  • 1,055
  • 1
  • 8
  • 22

1 Answers1

1

The below seems to work. The main point here is to loop over the elements and chack which one of them start with segment

import xml.etree.ElementTree as ET

xml = '''<Source>
    <document>response</document>
    <version>2.0</version>
    <segment1>
        <userRefNumber>test1</userRefNumber>
        <subscriber>
            <industryCode>ZZZZZ</industryCode>
            <memberCode>12345</memberCode>
            <inquirySubscriberPrefixCode>0622</inquirySubscriberPrefixCode>
        </subscriber>
        <options>
            <country>us</country>
            <language>en</language>
        </options>
        <tracking>
            <transactionTimeStamp>2021-02-25T04:09:30.508-06:00</transactionTimeStamp>
        </tracking>
    </segment1>
    <segment2>
        <userRefNumber>test2</userRefNumber>
        <subscriber>
            <industryCode>ZAAAA</industryCode>
            <memberCode>999999</memberCode>
            <inquirySubscriberPrefixCode>0622</inquirySubscriberPrefixCode>
        </subscriber>
        <options>
            <country>us</country>
            <language>en</language>
        </options>
        <tracking>
            <transactionTimeStamp>2020-02-25T04:09:30.508-06:00</transactionTimeStamp>
        </tracking>
    </segment2>
</Source>'''

root = ET.fromstring(xml)
counter = 1
for child in list(root):
    if child.tag.startswith('segment'):
        src = ET.Element('Source')
        src.append(child)
        with open(f'out_{counter}.xml','w') as f:
            tree = ET.ElementTree(src)
            tree.write(f,encoding="unicode")
        counter += 1
balderman
  • 22,927
  • 7
  • 34
  • 52