3

I am trying to do some web scraping with BS4.

So far I have extracted the <a> using

urls = [item for item in soup.select('h4 a')]

However, I only want to have the urls where the ID starts which entry.

<a href="http://www.sampleurl.com/static/welcome" id="entry_1">Lamborghini </a>

I have tried item.id but it does not work.

What am I missing?

abdusco
  • 9,700
  • 2
  • 27
  • 44
user7692855
  • 1,582
  • 5
  • 19
  • 39

1 Answers1

5

Use re module together with id.
Here's how:

from bs4 import BeautifulSoup
import re

if __name__ == "__main__":
    html = '<a href="http://www.sampleurl.com/static/welcome" id="entry_1">Lamborghini </a>'
    soup = BeautifulSoup(html, 'html.parser')

    print(soup.find('a', id=re.compile('^entry_')))

output:

<a href="http://www.sampleurl.com/static/welcome" id="entry_1">Lamborghini </a>
abdusco
  • 9,700
  • 2
  • 27
  • 44