Extracting an element whose id starts with a certain string using BeautifulSoup in python

Question

I am trying to do some web scraping with BS4.

So far I have extracted the <a> using

urls = [item for item in soup.select('h4 a')]

However, I only want to have the urls where the ID starts which entry.

<a href="http://www.sampleurl.com/static/welcome" id="entry_1">Lamborghini </a>

I have tried item.id but it does not work.

What am I missing?

Yep, if condition is "ID starts **with** 'entry'" `urls = [item for item in soup.select('h4 a') if item.get("id", "")[:6] == "entry_"]` — ipaleka, Jul 03 '19 at 20:11

score 5 · Accepted Answer · answered Jul 03 '19 at 20:26

5

Use re module together with id.
Here's how:

from bs4 import BeautifulSoup
import re

if __name__ == "__main__":
    html = '<a href="http://www.sampleurl.com/static/welcome" id="entry_1">Lamborghini </a>'
    soup = BeautifulSoup(html, 'html.parser')

    print(soup.find('a', id=re.compile('^entry_')))

output:

<a href="http://www.sampleurl.com/static/welcome" id="entry_1">Lamborghini </a>

answered Jul 03 '19 at 20:26

abdusco

9,700
2
27
44

1

now you have 2 problems – Corey Goldberg Jul 03 '19 at 21:29
True, but I think it's ok for one off solutions. Also it's an official/recommended way of selecting nodes https://www.crummy.com/software/BeautifulSoup/bs4/doc/#a-regular-expression – abdusco Jul 03 '19 at 21:31

Extracting an element whose id starts with a certain string using BeautifulSoup in python

1 Answers1