Questions tagged [beautifulsoup]

Beautiful Soup is a Python package for parsing HTML/XML. The latest version of this package is version 4, imported as bs4.

Beautiful Soup is a Python library for parsing HTML and XML files, which is useful in web scraping. It can use Python's standard HTML parser as well as other parsers such as lxml or html5lib. It provides simple, idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

Beautiful Soup 4 (commonly known as bs4, after the name of its Python module) is the latest version of Beautiful Soup, and is mostly backwards-compatible with Beautiful Soup 3. Beautiful Soup is published under MIT License.

From version 4.7.0, Beautiful Soup supports wide range of CSS4 selectors, adding to already rich collection of tools to select HTML/XML elements. You can read about wide range of CSS selectors and pseudo-classes here (soupsieve library - used by bs4).

To install the latest version with pip use pip install beautifulsoup4. And the library is imported in the project like this: from bs4 import BeautifulSoup

Notice: Beautiful Soup 3 works only on Python 2.x while Beautiful Soup 4 works on both Python 2 (2.7+) and Python 3

32305 questions

1488

votes

34 answers

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)

I'm having problems dealing with unicode characters from text fetched from different web pages (on different sites). I am using BeautifulSoup. The problem is that the error is not always reproducible; it sometimes works with some pages, and…

asked Mar 30 '12 at 12:06

Homunculus Reticulli

65,167
81
216
341

646

votes

19 answers

How to find elements by class

I'm having trouble parsing HTML elements with "class" attribute using Beautifulsoup. The code looks like this soup = BeautifulSoup(sdata) mydivs = soup.findAll('div') for div in mydivs: if (div["class"] == "stylelistrow"): print div I…

python html web-scraping beautifulsoup

asked Feb 18 '11 at 11:58

Neo

13,179
18
55
80

511

votes

12 answers

UnicodeEncodeError: 'charmap' codec can't encode characters

I'm trying to scrape a website, but it gives me an error. I'm using the following code: import urllib.request from bs4 import BeautifulSoup get = urllib.request.urlopen("https://www.website.com/") html = get.read() soup = BeautifulSoup(html) And…

python beautifulsoup file-io urllib

asked Nov 23 '14 at 18:47

SstrykerR

7,982
3
12
11

424

votes

21 answers

bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

... soup = BeautifulSoup(html, "lxml") File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 152, in __init__ % ",".join(features)) bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to…

python python-2.7 beautifulsoup lxml

asked Jun 25 '14 at 00:12

user3773048

5,839
4
18
22

356

votes

16 answers

How to remove \xa0 from string in Python?

I am currently using Beautiful Soup to parse an HTML file and calling get_text(), but it seems like I'm being left with a lot of \xa0 Unicode representing spaces. Is there an efficient way to remove all of them in Python 2.7, and change them into…

python python-2.7 unicode beautifulsoup utf-8

asked Jun 12 '12 at 09:12

zhuyxn

6,671
9
38
44

335

votes

1 answer

BeautifulSoup getting href

I have the following soup: next ... From this I want to extract the href, "some_url" I can do it if I only have one tag, but here there are two tags. I can also get the text 'next' but that's not…

python tags beautifulsoup

asked Apr 28 '11 at 08:25

dkgirl

4,489
7
24
26

274

votes

26 answers

Scraping: SSL: CERTIFICATE_VERIFY_FAILED error for http://en.wikipedia.org

I'm practicing the code from 'Web Scraping with Python', and I keep having this certificate problem: from urllib.request import urlopen from bs4 import BeautifulSoup import re pages = set() def getLinks(pageUrl): global pages html =…

python web-scraping beautifulsoup scrapy ssl-certificate

asked May 08 '18 at 14:32

Catherine4j

2,772
2
8
10

226

votes

5 answers

TypeError: a bytes-like object is required, not 'str' in python and CSV

TypeError: a bytes-like object is required, not 'str' I'm getting the above error while executing the below python code to save the HTML table data in a CSV file. How do I get rid of that error? import csv import requests from bs4 import…

python beautifulsoup html-table

asked Dec 15 '15 at 07:20

ShivaGuntuku

5,274
6
25
37

213

votes

13 answers

Beautiful Soup and extracting a div and its contents by ID

soup.find("tagName", { "id" : "articlebody" }) Why does this NOT return the

...

tags and stuff in between? It returns nothing. And I know for a fact it exists because I'm staring right at it from…

python beautifulsoup

asked Jan 25 '10 at 22:46

Tony Stark

24,588
41
96
113

211

votes

10 answers

Extracting an attribute value with beautifulsoup

I am trying to extract the content of a single "value" attribute in a specific "input" tag on a webpage. I use the following code: import urllib f = urllib.urlopen("http://58.68.130.147") s = f.read() f.close() from BeautifulSoup import…

python parsing attributes beautifulsoup

asked Apr 10 '10 at 06:53

Barnabe

2,235
2
14
6

186

votes

26 answers

ImportError: No Module Named bs4 (BeautifulSoup)

I'm working in Python and using Flask. When I run my main Python file on my computer, it works perfectly, but when I activate venv and run the Flask Python file in the terminal, it says that my main Python file has "No Module Named bs4." Any…

python beautifulsoup

asked Aug 02 '12 at 18:47

harryt

2,023
2
14
10

183

votes

16 answers

retrieve links from web page using python and BeautifulSoup

How can I retrieve the links of a webpage and copy the url address of the links using Python?

python web-scraping hyperlink beautifulsoup

asked Jul 03 '09 at 18:29

NepUS

1,899
2
14
9

182

votes

7 answers

How to find children of nodes using BeautifulSoup

I want to get all the tags which are children of

link1

link2

I know how to find element with particular class like…

python html beautifulsoup

asked Jun 09 '11 at 02:40

tej.tan

4,067
6
28
29

161

votes

9 answers

Difference between BeautifulSoup and Scrapy crawler?

I want to make a website that shows the comparison between amazon and e-bay product price. Which of these will work better and why? I am somewhat familiar with BeautifulSoup but not so much with Scrapy crawler.

python beautifulsoup scrapy web-crawler

asked Oct 30 '13 at 15:43

Nishant Bhakta

2,897
3
21
24

158

votes

10 answers

can we use XPath with BeautifulSoup?

I am using BeautifulSoup to scrape an URL and I had the following code, to find the td tag whose class is 'empformbody': import urllib import urllib2 from BeautifulSoup import BeautifulSoup url = …

python web-scraping xpath beautifulsoup urllib

asked Jul 13 '12 at 06:55

Shiva Krishna Bavandla

25,548
75
193
313

2 3

…

99 100 Next