How to scrape websites based on the site's title using python?

Question

I'm working on scraping websites for websites which contain a certain title. How would I make it, for example, check "example.com/xxxxxxxxxx" where "x" is a random number if it has title 404 or not?

marsnebulasoup · Accepted Answer · 2019-05-04T16:57:40.297

This finds the title of the page:

import requests
from lxml.html import fromstring

def Get_PageTitle(url):
    req = requests.get(url)
    tree = fromstring(req.content)
    title = tree.findtext('.//title')
    return title


url = "http://www.google.com"
title = Get_PageTitle(url)

if "404" in title:
    #title has 404
    print("Title has 404 in it")

else:
    #no 404 in title
    pass

Edit:

The above code checks if the title has 404 in it. If you want to know if the title is 404, use this code:

import requests
from lxml.html import fromstring

def Get_PageTitle(url):
    req = requests.get(url)
    tree = fromstring(req.content)
    title = tree.findtext('.//title')
    return title


url = "http://www.google.com"
title = Get_PageTitle(url)

if "404" is title:
    #title is 404
    print("Title is 404 in it")
    print(title)

else:
    #title is not 404
    pass

How to get page title in requests

I'm not looking for the website code,i'm looking for the page title — Kostel, May 04 '19 at 16:51
Ok...this checks the title to see if it has 404 in it...isn't that what you want? — marsnebulasoup, May 04 '19 at 16:53

How to scrape websites based on the site's title using python?

1 Answers1