3

If I use the following function I can grab the text and link I need from a website:

def get_url_text(url):
    source = requests.get(url)
    plain_text = source.text
    soup = BeautifulSoup(plain_text)
    for item_name in soup.findAll('li', {'class': 'ptb2'}):
        print(item_name.string)
        print (item_name.a)

get_url_text('https://www.residentadvisor.net/podcast.aspx')

returns:

RA.532 Marquis Hawkes
<a href="/podcast-episode.aspx?id=532"><h1>RA.532 Marquis Hawkes</h1></a>
RA.531 Evan Baggs
<a href="/podcast-episode.aspx?id=531"><h1>RA.531 Evan Baggs</h1></a>
RA.530 MCDE vs Jeremy Underground

If I only want the href link instead of the tags etc surrounding it do I need to use a regex or is there another method within BeautifulSoup?

Desired output is:

RA.532 Marquis Hawkes
https://www.residentadvisor.net/podcast-episode.aspx?id=532

for each similar element.

nipy
  • 5,138
  • 5
  • 31
  • 72
  • Possible duplicate of [Extracting an attribute value with beautifulsoup](http://stackoverflow.com/questions/2612548/extracting-an-attribute-value-with-beautifulsoup) – Daniel Sep 07 '16 at 21:15
  • @DanielG I looked at the linked post and would not have been able to resolve this scenario using the information it contains. The answer below from ewcz is very useful. – nipy Sep 07 '16 at 21:22
  • `output = inputTag[value]` (where `inputTag=item_name.a`; and `value='href'` in your case) is very similar to what you were looking for, as described in the first answer of said post. But I'm glad you found an answer and your problem is solved now. – Daniel Sep 07 '16 at 21:31
  • Thanks for explaining DanielG. – nipy Sep 07 '16 at 21:34

1 Answers1

3

you can use print(item_name.a['href']) and (if needed) prepend the prefix https://www.residentadvisor.net (since the links in the webpage are used in a form without explicit scheme and netloc part - for example, /podcast-episode.aspx?id=528)

ewcz
  • 12,819
  • 1
  • 25
  • 47