0

I'm trying to get the title, "APPEND 668.104" and the href "https://www.google.co.uk" from the following HTML.

<div class="product-name">
    <h2 class="product-name" title=" Item1"> Item1</h2>
    <strong>
       <span class="append-number">APPEND 668.104</span>
    <div class="actions">
       <div class="price">…</div>
       <a class="cta-button-secondary" href="https://www.google.co.uk">More info</a>

This HTML is repeated several times throughout the site and as such I need to store the unique title, "APPEND" and href link each time. The title and APPEND, work fine, however I can't get the href to work. Here is my code

product_old = []
product_name = []
product_link = []

for product in soup.select('div.product-name'):
    n = product.h2
    r = product.select_one('span.append-number')
    p = product.select_one('href.cta-button-secondary')
    if n and r and p:
        product_old.append(r.get_text(strip=True).rsplit(maxsplit=1)[-1])
        product_name.append(n.get_text(strip=True))
        product_link.append(p)


for a, b, c in zip(product_old, product_name, product_link):
    #print('{:<6} {}'.format(a, b, c))
    continue

Thank you!

  • 2
    Does this answer your question? [BeautifulSoup getting href](https://stackoverflow.com/questions/5815747/beautifulsoup-getting-href) – RichieV Sep 03 '20 at 20:07
  • `product_link.append(p['href'])` – RichieV Sep 03 '20 at 20:08
  • What do you mean by _can't get the href to work_ ? Have you done any debugging? Please see [ask], [help/on-topic]. – AMC Sep 04 '20 at 00:03

2 Answers2

0

Sorry I can't edit your code but this should point you in the right direction:

href = soup.find_all('a', attrs={'class':'cta-button-secondary'}, href=True)

You may need to iterate over href.

Paul Wilson
  • 562
  • 5
  • 16
0
product_old = []

product_name = []
product_link = []
for product in soup.select('div.product-name'):
    n = product.h2
    r = product.select_one('span.append-number')
    p = product.select_one('a.cta-button-secondary')['href'] # You should have a problem here
    if n and r and p:
        product_old.append(r.get_text(strip=True).rsplit(maxsplit=1)[-1])
        product_name.append(n.get_text(strip=True))
        product_link.append(p)
the_train
  • 71
  • 3