1

I am trying to make a list of the links that are inside a product page.

I have multiple links through which I want to get the links of the product page.

I am just posting the code for a single link.

r = requests.get("https://funskoolindia.com/products.php?search=9723100")
soup = BeautifulSoup(r.content)
for a_tag in soup.find_all('a', class_='product-bg-panel', href=True):
    print('href: ', a_tag['href'])

This is what it should print: https://funskoolindia.com/product_inner_page.php?product_id=1113

aschultz
  • 1,658
  • 3
  • 20
  • 30
james joyce
  • 483
  • 7
  • 24
  • Possible duplicate of [BeautifulSoup getting href](https://stackoverflow.com/questions/5815747/beautifulsoup-getting-href) – m13op22 Aug 16 '19 at 14:26
  • Maybe [this](https://stackoverflow.com/questions/41745514/getting-the-href-of-a-tag-which-is-in-li)? – m13op22 Aug 16 '19 at 14:28

3 Answers3

2

The site is dynamic, thus, you can use selenium

from bs4 import BeautifulSoup as soup
from selenium import webdriver
d = webdriver.Chrome('/path/to/chromedriver')
d.get('https://funskoolindia.com/products.php?search=9723100')
results = [*{i.a['href'] for i in soup(d.page_source, 'html.parser').find_all('div', {'class':'product-media light-bg'})}]

Output:

['product_inner_page.php?product_id=1113']
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
1

try this : print('href: ', a_tag.get("href")) and add features="lxml" to the BeautifulSoup constructor

chiko360
  • 33
  • 1
  • 8
1

The data are loaded dynamically through Javascript from different URL. One solution is using selenium - that executes Javascript and load links that way.

Other solution is using re module and parse the data url manually:

import re
import requests
from bs4 import BeautifulSoup

url = 'https://funskoolindia.com/products.php?search=9723100'
data_url = 'https://funskoolindia.com/admin/load_data.php'

data = {'page':'1',
    'sort_val':'new',
    'product_view_val':'grid',
    'show_list':'12',
    'brand_id':'',
    'checkboxKey': re.findall(r'var checkboxKey = "(.*?)";', requests.get(url).text)[0]}

soup = BeautifulSoup(requests.post(data_url, data=data).text, 'lxml')

for a in soup.select('#list-view .product-bg-panel > a[href]'):
    print('https://funskoolindia.com/' + a['href'])

Prints:

https://funskoolindia.com/product_inner_page.php?product_id=1113
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • this work fine ,but now i have to get the details of the product from the extracted urls,i think that will be dynamic too,so what do i do ..is this `re` method will on the extracted links or do i have to work with selenium.? – james joyce Aug 16 '19 at 14:56
  • @jamesjoyce You can experiment. `selenium` has it's overhead so it's slower than `requests` + `re` method. I suggest to look at Chrome/Firefox developer tools and see from where the page is loading the data - and then use that url with `requests`. – Andrej Kesely Aug 16 '19 at 14:59