0

I'm trying to figure out how to get Python3 to display a certain phrase from an HTML document. For example, I'll be using the search engine https://duckduckgo.com .

I'd like the code to do key search for var error=document.getElementById; and get it to display what in the parenthesis are, in this case, it would be "error_homepage". Any help would be appreciated.

import urllib.request
u = input ('Please enter URL: ')
x = urllib.request.urlopen(u)
print(x.read())
mohammed wazeem
  • 1,310
  • 1
  • 10
  • 26
Jake
  • 53
  • 1
  • 7
  • 1
    To do that you need an html parser like `beautifulsoup` – coder Jan 29 '18 at 09:42
  • Possible duplicate of [Examples for string find in Python](https://stackoverflow.com/questions/674764/examples-for-string-find-in-python) – aja Jan 29 '18 at 10:12

1 Answers1

0

You can simply read the website of interest, as you suggested, using urllib.request, and use regular expressions to search the retrieved HTML/JS/... code:

import re
import urllib.request

# the URL that data is read from
url = "http://..."

# the regex pattern for extracting element IDs
pattern = r"var error = document.getElementById\(['\"](?P<element_id>[a-zA-Z0-9_-]+)['\"]\);"

# fetch HTML code
with urllib.request.urlopen(url) as f:
    html = f.read().decode("utf8")

# extract element IDs
for m in re.findall(pattern, html):
    print(m)
paho
  • 1,162
  • 9
  • 13