-7

I am getting errors when running it.

import requests
from bs4 import BeautifulSoup

url = "http://sport.citifmonline.com/"
url_page_2 = "url" + "2016/10/15/chelsea-3-0-leicester-city-dominant-blues-comfortable-against-champions-photos/"
r = requests.get(url)

soup = BeautifulSoup(r.content, "html5lib")

links = soup.find_all("a")

for link in links:
    print "<a href='%s'>%s</a>" %(link.get("href"), link.text)

g_data = soup.find_all("div", {"class": "wrapper"})

for item in g_data:
    articles = item.content[0].find_all("a", {"class": "cat-box-content"})[0].text
    try:
        print item.contents[1].find_all("h3", {"class": "post-box-title"})[0].text
    except:
        pass
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • 4
    What errors? Post the traceback please. – alecxe Oct 17 '16 at 13:43
  • Not everyone will be willing or able to run your code; but some may still be able to provide help if you show exactly what errors you are getting and describe what you have tried to do already to solve the problem. – Daniel Oct 17 '16 at 13:50
  • File "", line 2 print "%s" %(link.get("href"), link.text) ^ IndentationError: expected an indented block @Daniel – Nyamedorba Oct 18 '16 at 06:22
  • after i fixed the indentation error i got this Traceback (most recent call last): File "", line 2, in File "C:\Python27\lib\encodings\cp850.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode character u'\u2019' in position 103: character maps to @Daniel – Nyamedorba Oct 18 '16 at 06:40

1 Answers1

1

If you haven't installed html5lib (e.g. with pip install html5lib), you won't be able to use this parser without getting an error. You could either install it or go e.g. for "html.parser" instead which is also mentioned in the documentation of BeautifulSoup - just to avoid any errors:

soup = BeautifulSoup(r.content, "html.parser")

Furthermore, the first line of your inner/second for loop throws a TypeError because you are trying to index something that is not subscriptable (because it's not a list or something like that, see e.g. here for more details). Actually, it's not even existing - the property content you're trying to access is None (which is of course not subscriptable). You should rather do direct calls of find_all on each of the elements:

item.find_all(...)
Community
  • 1
  • 1
mxscho
  • 1,990
  • 2
  • 16
  • 27