How to extract a dynamic substring from a list of strings in Python?

Question

I have a list of strings I scraped off the internet and I'm looking to extract their 'href':

<li class="subnav__item"><a class="subnav__link " href="/red-wine">Red Wine</a></li>
<li class="subnav__item"><a class="subnav__link " href="/white-wine">White Wine</a></li>
<li class="subnav__item"><a class="subnav__link " href="/rose-wine">Rosé Wine</a></li>
<li class="subnav__item"><a class="subnav__link " href="/fine-wine">Fine Wine</a></li>

For example, I'm looking to loop through the list and dynamically extract

/red-wine

from

<li class="subnav__item"><a class="subnav__link " href="/red-wine">Red Wine</a></li>

Thanks!

use the reference : https://stackoverflow.com/questions/4666973/how-to-extract-a-substring-from-inside-a-string-in-python This might be what you are looking for.. — Samarth, Jan 17 '18 at 07:38

score 1 · Answer 1 · answered Jan 17 '18 at 07:37

You can use lxml for this. Something like this:

from lxml import html
import request

response = request.get('<your url>')
tree = html.fromstring(response.text)
href = tree.xpath('//a[@class="subnav__item"]/@href')

This should get you all the href in from the class "subnav__item"

score 1 · Accepted Answer · answered Jan 17 '18 at 07:39

You can also get the required text using Beautiful Soup:

from bs4 import *
data = '\
<li class="subnav__item"><a class="subnav__link " href="/red-wine">Red Wine</a></li>\
<li class="subnav__item"><a class="subnav__link " href="/white-wine">White Wine</a></li>\
<li class="subnav__item"><a class="subnav__link " href="/rose-wine">Rosé Wine</a></li>\
<li class="subnav__item"><a class="subnav__link " href="/fine-wine">Fine Wine</a></li>'
soup = BeautifulSoup(data, "html.parser")

lis = soup.findAll('a')
for li in lis:
    print(li['href'])

/red-wine
/white-wine
/rose-wine
/fine-wine

How to extract a dynamic substring from a list of strings in Python?

2 Answers2