2

I'm using BeautifulSoup to make changes to an XML file, but I noticed that if I read in the file, put it into the constructor, and just spit it back out without making any changes to it, BeautifulSoup already made some changes to the code. For example, tag and attribute names are all lowercase, and the order of attributes in a tag are changed.

I know pragmatically this ought not to be an issue but the program that has to read the XML file is very nitpicky and won't accept these changes. I've found I can make changes using normal string operations without BeautifulSoup but this is inconvenient.

Is there a way to prevent BeautifulSoup from making changes to the XML upon reading it?

Booley
  • 819
  • 1
  • 9
  • 25
  • BeautifulSoup is an HTML parser. Don’t use it on XML. – Ry- Jul 19 '14 at 01:30
  • 1
    @minitech Not entirely accurate. BS can be used for XML parsing by installing the xml parser library - http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser – shaktimaan Jul 19 '14 at 01:45
  • Include the code you used, a sample input and expected output. – shaktimaan Jul 19 '14 at 02:07

1 Answers1

1

BeautifulSoup works by parsing the XML into objects that it understands. When it writes out the XML, it writes out the data in the objects, not the original string that was parsed.

One solution would be to parse the XML with BeautifulSoup, then use the resulting api to custom-write the XML to the requirements of your program. You can make a custom pretty printer or just iterate through the elements, manually indenting and formatting the data to your needs.

Community
  • 1
  • 1
Oleg
  • 1,659
  • 13
  • 23