I would like to check the website "https://haveibeenpwned.com/Passwords" of Troy Hunt automated with Python, if he has published a new password file. For this I read the website and would like to search it for a string to get the current version of the file. These are always named after the schema ....v5.7z. v stands here for the version.
# -*- coding: utf-8 -*-
import os
import urllib2
#from urllib2 import Request
from urllib2 import Request, urlopen, URLError, HTTPError
someurl='https://haveibeenpwned.com/Passwords'
req = Request(someurl, headers={'User-Agent': 'Mozilla/5.0'})
try:
response = urlopen(req)
except HTTPError as e:
print 'The server couldn\'t fulfill the request.'
print 'Error code: ', e.code
except URLError as e:
print 'We failed to reach a server.'
print 'Reason: ', e.reason
else:
print "everything is fine"
response = urllib2.urlopen(req)
the_page = response.read()
print(the_page)
in "the_page" is the entire HTML code of the page. How can I search it?
im not allowed to use beautifulsoap or an parser..