How to parse a html document which has download in my own computer file?

Asked Mar 21 '17 at 13:09

Active Mar 21 '17 at 14:48

Viewed 396 times

I have written a python program (from concurrent.futures import ThreadPoolExecutor) to collect and download html documents from this website (http://lis.ly.gov.tw/lydbc/lydbkmout?.ebe0C1E000901000000DC001E000000000000100000000C0370003dc5). When I open the html files on my computer(file:///Users/XXX.html), using requests and BeautifulSoup to parse these htmls. I failed to parse these htmls.

from bs4 import BeautifulSoup
import requests
url = 'file:///Users/martinchen/PycharmProjects/legislative%20yuan%20scratching/list_pages/list_page_1.html'
requests = requests.get(url)
lytext = requests.text
soup = BeautifulSoup(lytext, "html.parser")

And I get this outcome:

requests.exceptions.InvalidSchema: No connection adapters were found for 'file:///Users/martinchen/PycharmProjects/legislative%20yuan%20scratching/list_pages/list_page_1.html'

How to parse a html document which has download in my own computer file(file:///Users/XXX.html) just like relative links(http://XXX.html)?

edited Mar 21 '17 at 14:48

asked Mar 21 '17 at 13:09

MartinChen

You are using requests to load a local file, which requests does not support. To be able to handle local files with requests, try the anwser provided here: http://stackoverflow.com/a/22989322/1236628 – gmazlami Mar 21 '17 at 13:15
@gmazlami Thank you! I will try that method later! – MartinChen Mar 21 '17 at 13:29

How to parse a html document which has download in my own computer file?

0 Answers0