Regex required or can BeautifulSoup refine output

Question

If I use the following function I can grab the text and link I need from a website:

def get_url_text(url):
    source = requests.get(url)
    plain_text = source.text
    soup = BeautifulSoup(plain_text)
    for item_name in soup.findAll('li', {'class': 'ptb2'}):
        print(item_name.string)
        print (item_name.a)

get_url_text('https://www.residentadvisor.net/podcast.aspx')

returns:

RA.532 Marquis Hawkes
<a href="/podcast-episode.aspx?id=532"><h1>RA.532 Marquis Hawkes</h1></a>
RA.531 Evan Baggs
<a href="/podcast-episode.aspx?id=531"><h1>RA.531 Evan Baggs</h1></a>
RA.530 MCDE vs Jeremy Underground

If I only want the href link instead of the tags etc surrounding it do I need to use a regex or is there another method within BeautifulSoup?

Desired output is:

RA.532 Marquis Hawkes
https://www.residentadvisor.net/podcast-episode.aspx?id=532

for each similar element.

Possible duplicate of [Extracting an attribute value with beautifulsoup](http://stackoverflow.com/questions/2612548/extracting-an-attribute-value-with-beautifulsoup) — Daniel, Sep 07 '16 at 21:15
@DanielG I looked at the linked post and would not have been able to resolve this scenario using the information it contains. The answer below from ewcz is very useful. — nipy, Sep 07 '16 at 21:22
`output = inputTag[value]` (where `inputTag=item_name.a`; and `value='href'` in your case) is very similar to what you were looking for, as described in the first answer of said post. But I'm glad you found an answer and your problem is solved now. — Daniel, Sep 07 '16 at 21:31

ewcz · Accepted Answer · 2016-09-07T21:10:04.237

3

you can use print(item_name.a['href']) and (if needed) prepend the prefix https://www.residentadvisor.net (since the links in the webpage are used in a form without explicit scheme and netloc part - for example, /podcast-episode.aspx?id=528)

edited Sep 07 '16 at 21:10

answered Sep 07 '16 at 21:07

ewcz

12,819
1
25
47

Perfect, I will accept the answer when the system allows. thanks – nipy Sep 07 '16 at 21:09

Regex required or can BeautifulSoup refine output

1 Answers1