I'm trying to write a script that will search for text in a websites source code. I have it so it successfully grabs the source code and prints it out, and looks something like:
b'<?xml version="1.0" encoding="UTF-8" ?>\n<!DOCTYPE html
... and so on
However, when trying to search to find the 'div' tags in the code using print(page.find('div'))
, I get an error stating TypeError: Type str doesn't support the buffer API
I believe this has to do with the fact that I am receiving a byte literal. How do I encode this as UTF-8 or ASCII to be able to search for a string?
If needed, here is the simple code I am running:
import urllib.request
from urllib.error import URLError
def get_page(url):
#make the request
req = urllib.request.Request(url)
the_page = urllib.request.urlopen(req)
#get the results of the request
try:
#read the page
page = the_page.read()
print(page)
print(page.find('div'))
#except error
except URLError as e:
#if error has a reason (thus is url error) print the reason
if hasattr(e, 'reason'):
print(e.reason)
#if error has a code (thus is html error) print the code and the error
if hasattr(e, 'code'):
print(e.code)
print(e.read())