0

I'm using this code for non-JavaScript pages:

session = HTMLSession()
url = session.get("https://.......")
spoon = BeautifulSoup(url.text, "html.parser")
preTitle = spoon.find_all('title')
preTitleStr = str(preTitle)
a = preTitleStr.replace('<title>','')
Title = a.replace('</title>','')
print(Title)

This does not work with JS tho, so I tried:

session = HTMLSession()
qwerty = session.get("https://twitter.com/aProfile/")
qwerty.html.render()
asdf = qwerty.html.find('title')
print(str(asdf))

How do I grab the title of a JS page (the one that shows on the actual tab) with python-requests and beautifulsoup?

  • I spent a bit of time looking into it and it seems like it's easier to just use selenium to get the title, but there's a hit on performance. There's a few other suggestions that can be found here: https://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python If you're looking at specifically doing Twitter data, you might want to look at the Twitter API. – Andrew Wei Sep 06 '21 at 14:28

1 Answers1

0

Not using beautiful soup, but you can parse and get the title from a webpage like this.

import requests
import lxml
r = requests.get('https://www.google.com/')
data = lxml.html.fromstring(r.content)
title = data.findtext('.//title')

lxml is a tool for parsing xml and html, and looks for the title tag in the html code, and gets the value of it.

Jeremy Savage
  • 944
  • 5
  • 14