8

I've written a script that prints out all the .xml files in the current directory in xml format, but I can't figure out how to add the xmlns attributes to the top-level tag. The output I want to get is:

<?xml version='1.0' encoding='utf-8'?>
<databaseChangeLog 
      xmlns="http://www.host.org/xml/ns/dbchangelog"
      xmlns:xsi="http://www.host.org/2001/XMLSchema-instance"
      xsi:schemaLocation="www.host.org/xml/ns/dbchangelog">

    <include file="cats.xml"/>
    <include file="dogs.xml"/>
    <include file="fish.xml"/>
    <include file="meerkats.xml"/>

</databaseChangLog>

However, here is the output I am getting:

 <?xml version='1.0' encoding='utf-8'?>
 <databaseChangeLog>
    <include file="cats.xml"/>
    <include file="dogs.xml"/>
    <include file="fish.xml"/>
    <include file="meerkats.xml"/>
 </databaseChangLog>

Here is my script:

import lxml.etree
import lxml.builder
import glob

E = lxml.builder.ElementMaker()
ROOT = E.databaseChangeLog
DOC = E.include

# grab all the xml files
files = [DOC(file=f) for f in glob.glob("*.xml")]
the_doc = ROOT(*files)

str = lxml.etree.tostring(the_doc, pretty_print=True, xml_declaration=True, encoding='utf-8')

print str

I've found some examples online of explicitly setting namespace attributes, here and here, but to be honest they went a little over my head as I am just starting out. Is there another way to add these xmlns attributes to the databaseChangeLog tag?

Community
  • 1
  • 1
user1420913
  • 367
  • 1
  • 5
  • 13

1 Answers1

10
import lxml.etree as ET
import lxml.builder
import glob

dbchangelog = 'http://www.host.org/xml/ns/dbchangelog'
xsi = 'http://www.host.org/2001/XMLSchema-instance'
E = lxml.builder.ElementMaker(
    nsmap={
        None: dbchangelog,
        'xsi': xsi})

ROOT = E.databaseChangeLog
DOC = E.include

# grab all the xml files
files = [DOC(file=f) for f in glob.glob("*.xml")]

the_doc = ROOT(*files)
the_doc.attrib['{{{pre}}}schemaLocation'.format(pre=xsi)] = 'www.host.org/xml/ns/dbchangelog'

print(ET.tostring(the_doc,
                  pretty_print=True, xml_declaration=True, encoding='utf-8'))

yields

<?xml version='1.0' encoding='utf-8'?>
<databaseChangeLog xmlns:xsi="http://www.host.org/2001/XMLSchema-instance" xmlns="http://www.host.org/xml/ns/dbchangelog" xsi:schemaLocation="www.host.org/xml/ns/dbchangelog">
  <include file="test.xml"/>
</databaseChangeLog>
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • That worked, thanks so much! Would you happen to know how to format the output similar to what is in my desired output above? I thought pretty_print might help with that but it doesn't seem to help there :( – user1420913 Mar 12 '13 at 20:15
  • Sorry, I don't know of a robust way to do that. – unutbu Mar 12 '13 at 21:32
  • Really hope you're still around, almost 5 years later. This helped me a bunch, but I don't understand the construct `'{{{pre}}}schemaLocation'.format(pre=xsi)`. I understand format, but why the three braces around pre? – Todd Vanyo Feb 25 '18 at 03:15
  • We want a XML qualified name of the form [`'{uri}local'`](http://effbot.org/zone/element-namespaces.htm#element-tree-representation). This string has literal braces, but braces have a special meaning to the `format` method so we need to "escape" them to tell `format` we want literal braces instead of string substitution. ["If you need to include a brace character in the literal text, it can be escaped by doubling: {{ and }}."](https://docs.python.org/3/library/string.html#format-string-syntax). In other words, when `format` encounters 2 braces it returns a string with 1 literal brace. – unutbu Feb 25 '18 at 11:58