0

I have a list of strings I scraped off the internet and I'm looking to extract their 'href':

<li class="subnav__item"><a class="subnav__link " href="/red-wine">Red Wine</a></li>
<li class="subnav__item"><a class="subnav__link " href="/white-wine">White Wine</a></li>
<li class="subnav__item"><a class="subnav__link " href="/rose-wine">Rosé Wine</a></li>
<li class="subnav__item"><a class="subnav__link " href="/fine-wine">Fine Wine</a></li>

For example, I'm looking to loop through the list and dynamically extract

/red-wine

from

<li class="subnav__item"><a class="subnav__link " href="/red-wine">Red Wine</a></li>

Thanks!

ken.b89
  • 27
  • 5
  • use the reference : https://stackoverflow.com/questions/4666973/how-to-extract-a-substring-from-inside-a-string-in-python This might be what you are looking for.. – Samarth Jan 17 '18 at 07:38

2 Answers2

1

You can use lxml for this. Something like this:

from lxml import html
import request

response = request.get('<your url>')
tree = html.fromstring(response.text)
href = tree.xpath('//a[@class="subnav__item"]/@href')

This should get you all the href in from the class "subnav__item"

Gozy4
  • 444
  • 6
  • 11
1

You can also get the required text using Beautiful Soup:

from bs4 import *
data = '\
<li class="subnav__item"><a class="subnav__link " href="/red-wine">Red Wine</a></li>\
<li class="subnav__item"><a class="subnav__link " href="/white-wine">White Wine</a></li>\
<li class="subnav__item"><a class="subnav__link " href="/rose-wine">Rosé Wine</a></li>\
<li class="subnav__item"><a class="subnav__link " href="/fine-wine">Fine Wine</a></li>'
soup = BeautifulSoup(data, "html.parser")

lis = soup.findAll('a')
for li in lis:
    print(li['href'])
/red-wine
/white-wine
/rose-wine
/fine-wine
Adeel Ahmad
  • 1,033
  • 1
  • 11
  • 24