1

I'm trying to write a script that will search for text in a websites source code. I have it so it successfully grabs the source code and prints it out, and looks something like: b'<?xml version="1.0" encoding="UTF-8" ?>\n<!DOCTYPE html ... and so on

However, when trying to search to find the 'div' tags in the code using print(page.find('div')), I get an error stating TypeError: Type str doesn't support the buffer API I believe this has to do with the fact that I am receiving a byte literal. How do I encode this as UTF-8 or ASCII to be able to search for a string?

If needed, here is the simple code I am running:

import urllib.request
from urllib.error import URLError

def get_page(url):
  #make the request
  req = urllib.request.Request(url)
  the_page = urllib.request.urlopen(req)

  #get the results of the request
  try:
    #read the page
    page = the_page.read()
    print(page)
    print(page.find('div'))

  #except error
  except URLError as e:
    #if error has a reason (thus is url error) print the reason
    if hasattr(e, 'reason'):
      print(e.reason)
    #if error has a code (thus is html error) print the code and the error
    if hasattr(e, 'code'):
      print(e.code)
      print(e.read())
Hat
  • 1,691
  • 6
  • 28
  • 44

1 Answers1

0

I figure you are using Python v.3 (as stated from print as a function, not a statement).

In Python 3, page is a bytes object. So you need to search it using a bytes object too. Try this one:

print(page.find(b'div'))

Hope this can help

ldfa
  • 28
  • 5