0

I am new and I am trying to the links of each products for this web page all subpages (1-8): https://www.sodimac.cl/sodimac-cl/category/scat359268/Esmaltes-al-agua

I have a loop to go over each page but for some reason on page 7 it only brings 20 products and no products on page 8

This is the function that gets me all the URL for each product on each page:

def get_all_product_url(base_url):
    # Set up link and gets all URLs
    page = requests.get(base_url, stream=True)
    soup = BeautifulSoup(page.content, 'html.parser',from_encoding='utf-8')
    url_list = []
    try:
        products = soup.find_all('div', {'class':'jsx-3418419141 product-thumbnail'})
    except:
        return url_list
    for i in products:
        url = i.find("a").get('href')
        if 'https://www.sodimac.cl' in url:
            url_list.append(url)
        else:
            url_list.append('https://www.sodimac.cl'+url)
    # Return all web address without duplicates
    return list(set(url_list))

When I run it for page 8 I get an emply list

base_url = "https://www.sodimac.cl/sodimac-cl/category/scat359268/Esmaltes-al-agua?currentpage=8"
page = requests.get(base_url, stream=True)
soup = BeautifulSoup(page.content, 'html.parser',from_encoding='utf-8')
url_list = get_all_product_url(base_url)
url_list

If you run it for page 1, you will get 28 entries

base_url = "https://www.sodimac.cl/sodimac-cl/category/scat359268/Esmaltes-al-agua?currentpage=1"
page = requests.get(base_url, stream=True)
soup = BeautifulSoup(page.content, 'html.parser',from_encoding='utf-8')
url_list = get_all_product_url(base_url)
url_list

Any help I reall appreciate it.

Thanks

  • It's because page 1 returns "Tu búsqueda de “” no arrojó resultados." BTW, it's "scraping", not "scrapping." – Mike O'Connor Sep 26 '20 at 03:18
  • Thanks Mike, But I am not following you. – ruben_rosa Sep 26 '20 at 03:30
  • These are some of the elements I get for page 1 - ['https://www.sodimac.cl/sodimac-cl/product/3624439/Esmalte-al-Agua-Premium-Satinado-Morado-Angola-1-GL/3624439', 'https://www.sodimac.cl/sodimac-cl/product/3624412/Esmalte-al-Agua-Premium-Satinado-Verde-Mumbai-1-GL/3624412', 'https://www.sodimac.cl/sodimac-cl/product/3624498/Esmalte-al-Agua-Premium-Satinado-Rojo-Diomede-1-GL/3624498', 'https://www.sodimac.cl/sodimac-cl/product/3624447/Esmalte-al-Agua-Premium-Satinado-Verde-Foggia-1-GL/3624447', – ruben_rosa Sep 26 '20 at 03:33
  • 1
    You get empty on Page 8 because there are only 7 pages. – SaaSy Monster Sep 26 '20 at 03:52
  • So sorry, I just wrote page 1 when I was supposed to write page 8. I still get, when I paste https://www.sodimac.cl/sodimac-cl/category/scat359268/Esmaltes-al-agua?currentpage=8 into my browser a page that says "Tu búsqueda de no arrojó resultados" whereas with "=1" on the end I get a proper listing. So Ares Zephyr got it right. The site redirects the HTTP request to the "no resultados" page. – Mike O'Connor Sep 26 '20 at 07:50
  • Hi Folks, I just found out, I opened it in another (IE) browser and there is only 7 pages but Chrome is showing pages. I guess it is reading it from my history. Thanks a million both – ruben_rosa Sep 26 '20 at 10:06

0 Answers0