-1

I have an xml file which looks like the example below.

Many texts contain space as the start character, or have \n (newline) at the beginning, or other crazy stuff. I'm working with xml.etree.ElementTree, and it is good to parse from this file.

But I want more! :) I tried to prettify this mess, but without success. Tried many tutorials, but it always ends without pretty XML.

<?xml version="1.0"?>
<import>
<article>
<name> Name with space
</name>
<source> Daily Telegraph
</source>
<number>72/2015
</number>
<page>10
</page>
<date>2015-03-26
</date>
<author> Tomas First
</author>
<description>Economy
</description>
<attachment>
</attachment>
<region>
</region>
<text>
 My text is here
</text>
</article>
<article>
<name> How to parse
</name>
<source> Internet article
</source>
<number>72/2015
</number>
<page>1
</page>
<date>2015-03-26
</date>
<author>Some author
</author>
<description> description
</description>
<attachment>
</attachment>
<region>
</region>
<text>
 My text here
</text>
</article>
</import>

When I tried another answers from SO it generates same file or more messy XML

Tomas Pytel
  • 188
  • 2
  • 14
  • 1
    possible duplicate of [Pretty printing XML in python](http://stackoverflow.com/questions/749796/pretty-printing-xml-in-python) – Lukas Graf Apr 20 '15 at 21:49
  • The newlines are a major contributor to your answers generating messy XML, by the way. If you don't need newlines in your data, you could just strip them all out with something like `tr -d` (albeit skipping the first line -- something easily done), then almost any XML processor would do the right thing when told to pretty-print. – Charles Duffy Apr 20 '15 at 22:12
  • @LukasGraf If you look at possible duplicate, they solve differrent problems. I mean this is not duplicate. – Tomas Pytel May 19 '15 at 08:40

1 Answers1

2

bs4 can do it

from bs4 import BeautifulSoup

doc = BeautifulSoup(xmlstring, 'xml')

print doc.prettify()
Eric
  • 95,302
  • 53
  • 242
  • 374
  • Thank You. This helped me. I'm going to rewrite my code to reflect new pretty XML :) – Tomas Pytel Apr 20 '15 at 21:59
  • Here is python script that i wrote. May it be useful for someone. It parse slovak language terms. Many Thanks. http://pastebin.com/wj7TnN2f – Tomas Pytel May 01 '15 at 18:58