Scraping text in meta tag with selenium

Question

I'm trying to get the book description from the following webpage: https://bookshop.org/books/lucky-9798200961177/9781668002452

This is what I've got so far

***EDIT***
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument("--headless")

driver = webdriver.Chrome('path_to_my_driver_on_local', options=options)
driver.get('https://bookshop.org/a/16709/9781668002452')
description = driver.find_element_by_xpath("//meta[@name='description']").get_attribute("content")
description

Basically, I'm trying to get the text inside of this html:


<meta name="description" content="REESE'S BOOK CLUB PICK NEW YORK TIMES BESTSELLER A thrilling roller-coaster ride about a heist gone terribly wrong, with a plucky protagonist who will win readers' hearts. What if you had the winning ticket ....">

I end up with the following error

 Message: no such element: Unable to locate element: {"method":"xpath","selector":"//meta[@name='description']"}

score 3 · Answer 1 · answered Mar 02 '22 at 01:27

3

elem=driver.find_element(By.XPATH,"//meta[@name='description']")
print(elem.get_attribute("content"))

You can use a more inclusive xpath. Then target the attribute for content.

Imports:

from selenium.webdriver.common.by import By

answered Mar 02 '22 at 01:27

Arundeep Chohan

9,779
5
15
32

Thanks, I've tried the above steps, however I ended up with the following error message: `NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//meta[@name='description']"} (Session info: headless chrome=98.0.4758.102)` – soma Mar 02 '22 at 10:30

score 1 · Answer 2 · edited Mar 02 '22 at 08:41

You need to target the element with the correct xpath. Your value for the xpath //meta[@content] is returning the first meta element that contains a content attribute. I would recommend using the xpath //meta[@name="description"] or the css selector meta[name="description"] for a more precise selection. This works perfectly:

# imports and boilerplate
....

description_meta_element = driver.find_element_by_css_selector('meta[name="description"]')
description_meta_content = description_meta_element.get_attribute('content')
print(description_meta_content)

score 1 · Answer 3 · answered Mar 02 '22 at 22:51

This <meta> tag...

<meta name="description" content="REESE'S BOOK CLUB PICK NEW YORK TIMES BESTSELLER A thrilling roller-coaster ride about a heist gone terribly wrong, with a plucky protagonist who will win readers' hearts. What if you had the winning ticket ....">

...is within the <head> section. So Selenium won't be able to scrape this element.

Solution

In this case your best bet would be to use BeautifulSoup with urllib.request as follows:

from bs4 import BeautifulSoup
from urllib.request import urlopen #  In python3, urllib2 has been split into urllib.request and urllib.error

webpage = urlopen('https://bookshop.org/books/lucky-9798200961177/9781668002452').read()
soup = BeautifulSoup(webpage, "lxml")
my_meta = soup.find("meta",{"name":"description"})
print(my_meta[content])

References

You can find a couple of relevant detailed discussions in:

I've tried your solution and went through the links but in all cases I ended up either with `HTTPError: HTTP Error 503: Service Temporarily Unavailable`, or `HTTPError: HTTP Error 403: Forbidden`. I also tried to follow these steps: https://stackoverflow.com/questions/13303449/urllib2-httperror-http-error-403-forbidden but it also didn't help. I have issues to load the same content as it is in Chrome's inspector to the soup (currently just this can load at least something from the page). Would you please have any ideas? I'm stuck on this one. — soma, Mar 06 '22 at 22:17

score 0 · Answer 4 · answered Mar 01 '22 at 23:52

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
options = Options()
options.add_argument("--headless")

driver = webdriver.Chrome('path_to_my_driver_on_local', options=options)

driver.get('https://bookshop.org/books/lucky-9798200961177/9781668002452')
html = driver.page_source
soup = BeautifulSoup(html, 'lxml')
Description = soup.find_all('div', class_="title-description")
print(Description[0].text)

Scraping text in meta tag with selenium

4 Answers4

Solution

References