0

I've been trying to split one large .xml file in more .xml files in python for a few days now. The thing is I haven't really succeeded yet. So here I am asking for your help.

My large .xml file looks like this:

<Root>
     <Testcase>
          <Info1>[]<Info1>
          <Info2>[]<Info2>
     </Testcase>
     <Testcase>
          <Info1>[]<Info1>
          <Info2>[]<Info2>
     <Testcase>
     ...
     ...
     ...
     <Testcase>
          <Info1>[]<Info1>
          <Info2>[]<Info2>
     <Testcase>
</Root>

It has over 2000 children and what I would like to do is to parse this .xml file and split in smaller .xml files with 100 children each. That would result in 20 new .xml files.

How can I do that?

Thank you!

L.E.:

I've tried to parse the .xml file using xml.etree.ElementTree

import xml.etree.ElementTree as ET
file = open('Testcase.xml', 'r')
tree = ET.parse(file)

total_testcases = 0

for Testcase in root.findall('Testcase'):
    total_testcases+=1

nr_of_files = (total_testcases/100)+1

for i in range(nr_of_files+1):
    tree.write('Testcase%d.xml' % (i), encoding="UTF-8")

The thing is I don't know how to specifically get only the Testcases and copy them to another file...

Ciobby
  • 3
  • 1
  • 4
  • Add a sample of your XML file *with* data. – bad_keypoints Oct 20 '15 at 08:12
  • I googled a bit and came across this: http://stackoverflow.com/questions/7336694/how-to-split-an-xml-file-the-simple-way-in-python. This seems like it will solve your problem. – bad_keypoints Oct 20 '15 at 08:18
  • I've seen that post too, but I didn't quite understand how it worked, neither does it say how it creates another files with information from the first tree. Also, @bad_keypoints, the information in the .xml is ireelevant, it just looks like I described. – Ciobby Oct 20 '15 at 08:22

1 Answers1

1

Actually, root.findall('Testcase') will return a list of "Testcase" sub elements. So what need to do is:

  1. create root
  2. add sub elements to root.

Here is example:

>>> tcs = root.findall('Testcase')
>>> tcs
[<Element 'Testcase' at 0x23e14e0>, <Element 'Testcase' at 0x23e1828>]
>>> len(tcs)
2
>>> r = ET.Element('Root')
>>> r.append(tcs[0])
>>> ET.tostring(r, 'utf-8')
'<Root><Testcase>\n          <Info1>[]</Info1>\n          <Info2>[]</Info2>\n     </Testcase>\n     </Root>'
Rainman
  • 96
  • 4
  • Thanks @Rainman for the answer, but how can I write to a file instead of using ET.tostring(r, 'utf-8')? – Ciobby Oct 20 '15 at 10:59
  • Create xml files and write the return of ET.tostring to the file. with open("test.xml", "w+") as f: f.write(ET.tostring(r, 'utf-8')) – Rainman Oct 21 '15 at 01:15