I'm working on scraping websites for websites which contain a certain title. How would I make it, for example, check "example.com/xxxxxxxxxx" where "x" is a random number if it has title 404 or not?
Asked
Active
Viewed 58 times
-2
-
2Where is your code? – Olvin Roght May 04 '19 at 16:44
1 Answers
0
This finds the title of the page:
import requests
from lxml.html import fromstring
def Get_PageTitle(url):
req = requests.get(url)
tree = fromstring(req.content)
title = tree.findtext('.//title')
return title
url = "http://www.google.com"
title = Get_PageTitle(url)
if "404" in title:
#title has 404
print("Title has 404 in it")
else:
#no 404 in title
pass
Edit:
The above code checks if the title has 404 in it. If you want to know if the title is 404, use this code:
import requests
from lxml.html import fromstring
def Get_PageTitle(url):
req = requests.get(url)
tree = fromstring(req.content)
title = tree.findtext('.//title')
return title
url = "http://www.google.com"
title = Get_PageTitle(url)
if "404" is title:
#title is 404
print("Title is 404 in it")
print(title)
else:
#title is not 404
pass

marsnebulasoup
- 2,530
- 2
- 16
- 37
-
-
Ok...this checks the title to see if it has 404 in it...isn't that what you want? – marsnebulasoup May 04 '19 at 16:53