73

After I installed BeautifulSoup, whenever I run my Python in from the command line, this warning comes out:

D:\Application\python\lib\site-packages\beautifulsoup4-4.4.1-py3.4.egg\bs4\__init__.py:166:
UserWarning: No parser was explicitly specified, so I'm using the best 
available HTML parser for this system ("html.parser"). This usually isn't a
problem, but if you run this code on another system, or in a different
virtual environment, it may use a different parser and behave differently.

To get rid of this warning, change this:

 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "html.parser")

I have no idea why it comes out and how to solve it.

Gino Mempin
  • 25,369
  • 29
  • 96
  • 135
jellyfishhuang
  • 809
  • 1
  • 6
  • 6
  • 14
    The message is telling you exactly what to do: `BeautifulSoup([your markup], "html.parser")`. Did you do that and see what your output is? BeautifulSoup is trying to make your life easier. Listen to the Soup. :) – idjaw Nov 04 '15 at 00:14
  • 5
    Change your code such like `soup = BeautifulSoup(html)` to `soup = BeautifulSoup(html, "html.parser")`. – Remi Guan Nov 04 '15 at 00:21

4 Answers4

128

The solution to your problem is clearly stated in the error message. Code like the below does not specify an XML/HTML/etc. parser.

BeautifulSoup( ... )

In order to fix the error, you'll need to specify which parser you'd like to use, like so:

BeautifulSoup( ..., "html.parser" )

You can also install a 3rd party parser if you'd like.

Ethan Bierlein
  • 3,353
  • 4
  • 28
  • 42
  • 4
    See Beautiful Soup's [installing a parser](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser) docs for the advantages/disadvantages of some common parsers (html.parser, lxml, html5lib) – Jonny Dec 11 '19 at 13:54
23

Documentation recommends that you install and use lxml for speed.

BeautifulSoup(html, "lxml")

If you’re using a version of Python 2 earlier than 2.7.3, or a version of Python 3 earlier than 3.2.2, it’s essential that you install lxml or html5lib–Python’s built-in HTML parser is just not very good in older versions.

Installing LXML parser

  • On Ubuntu (debian)

    apt-get install python-lxml 
    
  • Fedora (RHEL based)

    dnf install python-lxml
    
  • Using PIP

    pip install lxml
    
Gayan Weerakutti
  • 11,904
  • 2
  • 71
  • 68
12

In my opinion, the previous posts did not answer the question.

Yes, as everyone said, you can remove the warning by specifying the parser.
And as pointed by the documentation, it is a best-practice for performances 1 and for consistency 2.

But in some cases, you want to silence the warning... Hence this post.

  • since BeautifulSoup 4 rev 460, the warning message does not appear in interactive (REPL) mode
  • there are more generalist answers at: How to disable Python warnings? to control Python warnings (TL;DL: PYTHONWARNINGS=ignore or -Wignore)
  • suppressing the warning explicitly (bs4 ≥ rev 569) by adding to your code:
    import warnings
    from bs4 import GuessedAtParserWarning
    warnings.filterwarnings('ignore', category=GuessedAtParserWarning)
    
  • cheating by letting bs4 think you provided the parser, i.e.:
    bs4.BeautifulSoup(
      your_markup,
      builder=bs4.builder_registry.lookup(*bs4.BeautifulSoup.DEFAULT_BUILDER_FEATURES)
    )
    
Spherical Cowboy
  • 565
  • 6
  • 14
bufh
  • 3,153
  • 31
  • 36
  • To quote PEP-20, point 2: "Explicit is better than implicit". Don't hide warnings when the proper fix is less code. Adding "html5lib" or "html.parser" is trivial in an IDE or code editor, and almost-trivial on the command line. Just fix the problem, don't hide the symptom. – Mike 'Pomax' Kamermans Jul 15 '20 at 18:59
  • 2
    Thank you! Sometimes a library spams the warning, and filtering it is the solution – sor.rge Nov 10 '22 at 16:14
  • And sometimes you're actually parsing XHTML which is, after all, an XML document. Thus specifying `lxml` is correct and you still get the warning. – Craig Trader Jul 04 '23 at 17:25
6

For HTML parser, you need to install html5lib, run:

pip install html5lib

then add html5lib in the BeautifulSoup method:

htmlDoc = bs4.BeautifulSoup(req1.text, 'html5lib')
print(htmlDoc)
Wilson Wu
  • 1,790
  • 19
  • 13