How to get
's using BeautifulSoup

Question

I'm trying to get the <li>'s of an html using python's library BeautifulSoup.

The HTML im trying to parse is this one:

https://ccnav6.com/ccna-4-chapter-1-exam-answers-2017-v5-0-3-v6-0-full-100.html

It contains a list of questions and answers and I'm trying to parse those.

My Problem is, that no matter how I go about to parse the html, I only get the first <li>.

My Code:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

url = 'https://ccnav6.com/ccna-4-chapter-1-exam-answers-2017-v5-0-3-v6-0-full-100.html'
uClient = uReq(url)
# getting html from connection
page_html = uClient.read()
# close connection
uClient.close()
# use beautifulSoup to parse html
page_soup = soup(page_html, "html.parser")
# get main content of page
contentBlock = page_soup.find("div",{"class":"post-single-content box mark-links entry-content"})
# get all questions and answers
questions = questions = contentBlock.div.ol.li.ol.findAll("li")
# for some reason i'm only getting the first question

Change the parser from `html.parser` to `lxml` and it'll work. Not exactly sure why though, maybe the HTML is broken. You'll need to download that parser first. `pip install lxml`. — Keyur Potdar, Mar 28 '18 at 05:21
@KeyurPotdar wow, thank you so much. Really weird behaviour ... i'm new to web-scraping and was sitting here for a few hours not understanding why it only outputs the first element ... — Time4Boom, Mar 28 '18 at 05:25
The HTML there is broken, as it contains `` closing tags without opening tags. Try one of the different parsers, so `lxml` or `html5lib`. — Martijn Pieters, Mar 28 '18 at 09:10
Both `lxml` and `html5lib` produce 27 `li` elements, `html.parser` really doesn't like those stray closing tags. — Martijn Pieters, Mar 28 '18 at 09:15
@MartijnPieters oh, I didn't even notice that. Thanks for the help. — Time4Boom, Mar 28 '18 at 10:32

How to get 's using BeautifulSoup

0 Answers0

How to get
's using BeautifulSoup