BeautifulSoup errors

Question

I am getting errors when running it.

import requests
from bs4 import BeautifulSoup

url = "http://sport.citifmonline.com/"
url_page_2 = "url" + "2016/10/15/chelsea-3-0-leicester-city-dominant-blues-comfortable-against-champions-photos/"
r = requests.get(url)

soup = BeautifulSoup(r.content, "html5lib")

links = soup.find_all("a")

for link in links:
    print "<a href='%s'>%s</a>" %(link.get("href"), link.text)

g_data = soup.find_all("div", {"class": "wrapper"})

for item in g_data:
    articles = item.content[0].find_all("a", {"class": "cat-box-content"})[0].text
    try:
        print item.contents[1].find_all("h3", {"class": "post-box-title"})[0].text
    except:
        pass

Not everyone will be willing or able to run your code; but some may still be able to provide help if you show exactly what errors you are getting and describe what you have tried to do already to solve the problem. — Daniel, Oct 17 '16 at 13:50
File "", line 2 print "%s" %(link.get("href"), link.text) ^ IndentationError: expected an indented block @Daniel — Nyamedorba, Oct 18 '16 at 06:22
after i fixed the indentation error i got this Traceback (most recent call last): File "", line 2, in File "C:\Python27\lib\encodings\cp850.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode character u'\u2019' in position 103: character maps to @Daniel — Nyamedorba, Oct 18 '16 at 06:40

score 1 · Answer 1 · edited May 23 '17 at 10:27

1

If you haven't installed html5lib (e.g. with pip install html5lib), you won't be able to use this parser without getting an error. You could either install it or go e.g. for "html.parser" instead which is also mentioned in the documentation of BeautifulSoup - just to avoid any errors:

soup = BeautifulSoup(r.content, "html.parser")

Furthermore, the first line of your inner/second for loop throws a TypeError because you are trying to index something that is not subscriptable (because it's not a list or something like that, see e.g. here for more details). Actually, it's not even existing - the property content you're trying to access is None (which is of course not subscriptable). You should rather do direct calls of find_all on each of the elements:

item.find_all(...)

edited May 23 '17 at 10:27

Community

1
1

answered Oct 17 '16 at 13:55

mxscho

1,990
2
16
27

Inb4 pointless downvotes without any constructive criticism. :) – mxscho Oct 17 '16 at 14:49
Probably because `html5lib` is a valid parser so your answer is simply incorrect, the second part of your answer is also wrong. – Padraic Cunningham Oct 18 '16 at 17:42
Thanks, I did with pip install html5lib. – GSandro_Strongs Oct 26 '20 at 00:15

BeautifulSoup errors

1 Answers1