12

This question is specific to BeautifulSoup4, which makes it different from the previous questions:

Why is BeautifulSoup modifying my self-closing elements?

selfClosingTags in BeautifulSoup

Since BeautifulStoneSoup is gone (the previous xml parser), how can I get bs4 to respect a new self-closing tag? For example:

import bs4   
S = '''<foo> <bar a="3"/> </foo>'''
soup = bs4.BeautifulSoup(S, selfClosingTags=['bar'])

print soup.prettify()

Does not self-close the bar tag, but gives a hint. What is this tree builder that bs4 is referring to and how to I self-close the tag?

/usr/local/lib/python2.7/dist-packages/bs4/__init__.py:112: UserWarning: BS4 does not respect the selfClosingTags argument to the BeautifulSoup constructor. The tree builder is responsible for understanding self-closing tags.
  "BS4 does not respect the selfClosingTags argument to the "
<html>
 <body>
  <foo>
   <bar a="3">
   </bar>
  </foo>
 </body>
</html>
Community
  • 1
  • 1
Hooked
  • 84,485
  • 43
  • 192
  • 261

1 Answers1

15

To parse XML you pass in “xml” as the second argument to the BeautifulSoup constructor.

soup = bs4.BeautifulSoup(S, 'xml')

You’ll need to have lxml installed.

You don't need to pass selfClosingTags anymore:

In [1]: import bs4
In [2]: S = '''<foo> <bar a="3"/> </foo>'''
In [3]: soup = bs4.BeautifulSoup(S, 'xml')
In [4]: print soup.prettify()
<?xml version="1.0" encoding="utf-8"?>
<foo>
 <bar a="3"/>
</foo>
Pavel Anossov
  • 60,842
  • 14
  • 151
  • 124
  • This works by passing the list of `selfClosingTags`, but it still gives the same warning as above. Am I doing something wrong? – Hooked Feb 19 '13 at 15:56
  • Nevermind, the docs answer that question. It seems that a self-closing tag on xml mode is made automatically when the content is empty and a list should not be passed. – Hooked Feb 19 '13 at 15:58
  • Right. Added a demonstration. – Pavel Anossov Feb 19 '13 at 16:01
  • Do you know if it's possible to use BS4 to *create* a self-closing tag (using [new_tag](http://www.crummy.com/software/BeautifulSoup/bs4/doc/#beautifulsoup-new-string-and-new-tag))? – detly Feb 19 '14 at 06:48
  • 2
    @detly, if it's empty, it will be self-closing (in XML mode). – Pavel Anossov Feb 19 '14 at 12:56
  • Note to self: I once passed a string named `label` to a tag such as ``, without explicit quote marks and ended up with ``. Moral: Make sure you add the quote marks in self-closing tags (you may be able to get away with it if the tags are paired). And another 30 minutes of my life gone... – PatrickT May 02 '20 at 06:52