If I use the following function I can grab the text and link I need from a website:
def get_url_text(url):
source = requests.get(url)
plain_text = source.text
soup = BeautifulSoup(plain_text)
for item_name in soup.findAll('li', {'class': 'ptb2'}):
print(item_name.string)
print (item_name.a)
get_url_text('https://www.residentadvisor.net/podcast.aspx')
returns:
RA.532 Marquis Hawkes
<a href="/podcast-episode.aspx?id=532"><h1>RA.532 Marquis Hawkes</h1></a>
RA.531 Evan Baggs
<a href="/podcast-episode.aspx?id=531"><h1>RA.531 Evan Baggs</h1></a>
RA.530 MCDE vs Jeremy Underground
If I only want the href link instead of the tags etc surrounding it do I need to use a regex or is there another method within BeautifulSoup?
Desired output is:
RA.532 Marquis Hawkes
https://www.residentadvisor.net/podcast-episode.aspx?id=532
for each similar element.