0

I'm working on a web scraper and it has many different variables so keeping each variable to a single line is important to me. The current variable I am working on I have down to this:

<a href="http://website.com/example/123" target="_blank">Example</a>

Is there any simple way I can simply get the website (http://website.com/example/123 in this case) scrapped out in one line of code?

I'm currently using urllib, re, and BeautifulSoup so any of those libraries are fine. I tried adding

.find('a', attrs={'href': re.compile("^http://")})

to the end of my line, but it made the output return nothing.

Vale
  • 1,003
  • 1
  • 9
  • 22

1 Answers1

2

I believe all you have to do is yourVarName['href']:

from bs4 import BeautifulSoup

html = '''<a href="http://website.com/example/123" target="_blank">Example</a>'''

soup = BeautifulSoup(html)

for a in soup.find_all('a', href=True):
    print "Found the URL:", a['href']

Found the URL: http://website.com/example/123

https://stackoverflow.com/a/5815888/3920284

Community
  • 1
  • 1
jeremy
  • 307
  • 4
  • 15