How to use CSS selectors to retrieve specific links using BeautifulSoup?

Question

Im using python for scraping the following page: alfabeta.surge.sh and i would like to get the link in (#home1 > div:nth-child(10) > table:nth-child(29) > tbody > tr:nth-child(1) > td:nth-child(3) > a )

Actually im doing this:

import bs4, requests
res = requests.get('https://alfabeta.surge.sh/')
soup = bs4.BeautifulSoup(res.text, 'html.parser')
soup.find_all('a')[23].attrs.get('href')

But if the position of the change i cant download the content

Does this answer your question? [How to use CSS selectors to retrieve specific links lying in some class using BeautifulSoup?](https://stackoverflow.com/questions/24801548/how-to-use-css-selectors-to-retrieve-specific-links-lying-in-some-class-using-be) — Will Da Silva, Jun 15 '21 at 03:52

score 0 · Accepted Answer · answered Jun 15 '21 at 05:30

0

You will need to make some assumptions about what is most likely to remain constant, and then review over time. For example, I might assume you want the 3rd column td's child a tag href, from the table which is the first following the div with containing the string Catálogo Actualizaciones. One css pattern for that would be as follows:

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://alfabeta.surge.sh/')
soup = bs(r.text, 'lxml')
print(soup.select_one('div:-soup-contains("Catálogo Actualizaciones") ~ table td:nth-child(3) > a')['href'])

answered Jun 15 '21 at 05:30

QHarr

83,427
12
54
101

i tried to do that but response {NotImplementedError}':-soup-contains' pseudo-class is not implemented at this time – Matias Corvalan Jun 16 '21 at 01:39
`print(soup.select_one('div:contains("Catálogo Actualizaciones") ~ table td:nth-child(3) > a')['href'])` – QHarr Jun 16 '21 at 01:54
Or upgrade your bs4/soupsieve to latest. – QHarr Jun 16 '21 at 01:54

How to use CSS selectors to retrieve specific links using BeautifulSoup?

1 Answers1