How to get page title in requests

Question

What would be the simplest way to get the title of a page in Requests?

r = requests.get('http://www.imdb.com/title/tt0108778/')
# ? r.title
Friends (TV Series 1994–2004) - IMDb

If you're doing anything beyond grabbing a single page, [Scrapy](http://scrapy.org/doc/) might be very useful. — Nick T, Nov 08 '14 at 01:02

score 27 · Accepted Answer · edited May 23 '17 at 12:25

You need an HTML parser to parse the HTML response and get the title tag's text:

Example using lxml.html:

>>> import requests
>>> from lxml.html import fromstring
>>> r = requests.get('http://www.imdb.com/title/tt0108778/')
>>> tree = fromstring(r.content)
>>> tree.findtext('.//title')
u'Friends (TV Series 1994\u20132004) - IMDb'

There are certainly other options, like, for example, mechanize library:

>>> import mechanize
>>> br = mechanize.Browser()
>>> br.open('http://www.imdb.com/title/tt0108778/')
>>> br.title()
'Friends (TV Series 1994\xe2\x80\x932004) - IMDb'

What option to choose depends on what are you going to do next: parse the page to get more data, or, may be, you want to interact with it: click buttons, submit forms, follow links etc.

Besides, you may want to use an API provided by IMDB, instead of going down to HTML parsing, see:

Example usage of an IMDbPY package:

>>> from imdb import IMDb
>>> ia = IMDb()
>>> movie = ia.get_movie('0108778')
>>> movie['title']
u'Friends'
>>> movie['series years']
u'1994-2004'

Simplest and most straight forward answer. Thank you. – Seth Mar 27 '20 at 18:55 — Seth, Mar 27 '20 at 18:55

Greg · Answer 2 · 2014-11-08T01:05:56.203

17

You could use beautifulsoup to parse the HTML.

Install it using pip install beautifulsoup4

>>> import requests
>>> r = requests.get('http://www.imdb.com/title/tt0108778/')
>>> import bs4
>>> html = bs4.BeautifulSoup(r.text)
>>> html.title
<title>Friends (TV Series 1994–2004) - IMDb</title>
>>> html.title.text
u'Friends (TV Series 1994\u20132004) - IMDb'

edited Nov 08 '14 at 01:05

answered Nov 08 '14 at 00:59

Greg

5,422
1
27
32

Why there is an `r` present before `requests.get` ? – Avinash Raj Nov 08 '14 at 01:05

score 13 · Answer 3 · edited Mar 28 '21 at 08:19

13

No need to import other libraries. requests has this functionality built-in.

>>> hearders = {'headers':'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:51.0) Gecko/20100101 Firefox/51.0'}
>>> n = requests.get('http://www.imdb.com/title/tt0108778/', headers=hearders)
>>> al = n.text
>>> al[al.find('<title>') + 7 : al.find('</title>')]
u'Friends (TV Series 1994\u20132004) - IMDb'

Update after ZN13's comment

>>> import re
>>> import requests
>>> n = requests.get('https://www.libsdl.org/release/SDL-1.2.15/docs/html/guideinputkeyboard.html')
>>> al = n.text
>>> d = re.search('<\W*title\W*(.*)</title', al, re.IGNORECASE)
>>> d.group(1)
u'Handling the Keyboard'

This will work for all cases, whether extra non-alphabetical characters are present with the <title> tag or not.

edited Mar 28 '21 at 08:19

Tonechas

13,398
16
46
80

answered Jan 31 '17 at 12:40

Rahul Chawla

1,048
10
15

2

This doesn't work if the title tag has a newline in it, for example `` (for example with this page: https://www.libsdl.org/release/SDL-1.2.15/docs/html/guideinputkeyboard.html) – ZN13 Dec 05 '17 at 19:06
3

The regexp should be `'<\W*title\W*(.*) – alissonmuller Feb 09 '20 at 15:51

score 5 · Answer 4 · answered Apr 26 '18 at 01:33

5

Pythonic HTML Parsing for Humans.

from requests_html import HTMLSession

print(HTMLSession().get('http://www.imdb.com/title/tt0108778/').html.find('title', first=True).text)

answered Apr 26 '18 at 01:33

井上智文

1,905
17
14

score 2 · Answer 5 · answered Nov 11 '17 at 09:24

2

Regex with lookbehind and lookforward:

re.search('(?<=<title>).+?(?=</title>)', mytext, re.DOTALL).group().strip()

re.DOTALL because title can have a new line character \n

answered Nov 11 '17 at 09:24

Vitaly Zdanevich

13,032
8
47
81

How to get page title in requests

5 Answers5

Linked