1

I'm trying to do a webscraping of product images from a website but without success. At first, the classes and tags are related, but the images field does not return any value, i.e., []. I need to help to show link images.

The code is:

from pandas_datareader import data as pdr
import numpy as np
import pandas as pd
from datetime import datetime,timedelta
from bs4 import BeautifulSoup
import requests
import math
import re
from requests_html import HTMLSession, AsyncHTMLSession
from lxml import etree
import xlwt
import time
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
        


url_base='https://www.compracerta.com.br/celulares-e-smartphones?page=1'
executable_path = r'C:\Users\vinig\Downloads\chromedriver_win32\chromedriver.exe'
browser = webdriver.Chrome(executable_path=executable_path)
browser.get(url_base)
time.sleep(5)
WebDriverWait(browser,40).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click()
links = [my_elem.get_attribute("href") for my_elem in browser.find_elements(By.CSS_SELECTOR, "article.box-produto a.prod-info")]
titles = [my_elem.text for my_elem in browser.find_elements(By.CSS_SELECTOR, "article.box-produto a.prod-info h3 p")]
prices = [my_elem.text for my_elem in browser.find_elements(By.CSS_SELECTOR, "article.box-produto a.prod-info p span.por > span")]
images = [my_elem.text for my_elem in browser.find_elements(By.CSS_SELECTOR, "article.box-produto a.prod-info img.product_img_img__ac vtex-render-runtime-8-x-lazyload ls-is-cached lazyloaded")]
browser.quit()
print(links, titles, prices, images)

Images is returning "empty", i.e., [] as image below: Error Return

The Element Page is: enter image description here

Davi Riani
  • 43
  • 4

2 Answers2

0

I think there is a problem with your CSS-selector. I personally prefer XPATH. Moreover, img elements don't have text (as far as I know). You can get the alt attribute instead:

import pandas as pd
import time
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
        
url_base='https://www.compracerta.com.br/celulares-e-smartphones?page=1'
executable_path = r'C:\Users\vinig\Downloads\chromedriver_win32\chromedriver.exe'
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument("window-size=1920,1080")
browser = webdriver.Chrome(options=chrome_options)
browser.get(url_base)
time.sleep(5)
WebDriverWait(browser,40).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click()
links = [my_elem.get_attribute("href") for my_elem in browser.find_elements(By.CSS_SELECTOR, "article.box-produto a.prod-info")]
titles = [my_elem.text for my_elem in browser.find_elements(By.CSS_SELECTOR, "article.box-produto a.prod-info h3 p")]
prices = [my_elem.text for my_elem in browser.find_elements(By.CSS_SELECTOR, "article.box-produto a.prod-info p span.por > span")]
images = [my_elem.get_attribute('alt') for my_elem in browser.find_elements(By.XPATH, "//img[@class='product_img_img__ac vtex-render-runtime-8-x-lazyload lazyloaded']")]
browser.quit()
print(images)

Output:

['iPhone 12 64GB - Azul', 'Smartphone Samsung Galaxy A03S 64GB 4GB RAM 4G Wi-Fi Dual Chip Câmera Tripla + Selfie 5MP 6.5" Preto', 'iPhone 11 128GB - Preto', 'Smartphone Motorola Moto G32 128GB 4GB RAM 6,5"- Preto', 'Smartphone Motorola Moto G32 128GB 4GB RAM 6,5"- Rose', 'Smartphone Motorola Moto Edge 30 Ultra 256GB 12GB RAM Câmera Tripla 200 MP OIS 50 MP 12 MP Tela 6.7" White']
Tranbi
  • 11,407
  • 6
  • 16
  • 33
0

You were almost there. To scrape the product image information of the products from the website you can use list comprehension and you can use the following locator strategies:

  • Printing src attribute:

    driver.get('https://www.compracerta.com.br/celulares-e-smartphones?page=1')
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click()
    print([my_elem.get_attribute('src') for my_elem in driver.find_elements(By.CSS_SELECTOR, "article.box-produto a.image > img")])
    
  • Console output:

    ['https://compracerta.vtexassets.com/arquivos/ids/556473-266-266/image-cb2c596eae2c4291aefd87248dae47bb.jpg?v=637683681866900000', 'https://compracerta.vtexassets.com/arquivos/ids/539288-266-266/image-c4b1776f2c6249bb9c951e60a1f7d251.jpg?v=637677626053400000', 'https://compracerta.vtexassets.com/arquivos/ids/457660-266-266/image-1225eff99f1242338f3c5fe6daede49b.jpg?v=637504852415570000', 'https://compracerta.vtexassets.com/arquivos/ids/707178-266-266/image-fe45c397563b4a63b9e87bbc3903b4e6.jpg?v=638094758268370000', 'https://compracerta.vtexassets.com/arquivos/ids/457165-266-266/image-b7293d45b28447988dbf245afdf93255.jpg?v=637498804044400000', 'https://compracerta.vtexassets.com/arquivos/ids/707183-266-266/image-c448c36dead64593ae984592f965b969.jpg?v=638094758658100000', None, None, None]
    
  • Printing data-src attribute:

    driver.get('https://www.compracerta.com.br/celulares-e-smartphones?page=1')
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click()
    print([my_elem.get_attribute('data-src') for my_elem in driver.find_elements(By.CSS_SELECTOR, "article.box-produto a.image > img")])
    
  • Console output:

    ['https://compracerta.vtexassets.com/arquivos/ids/556473-266-266/image-cb2c596eae2c4291aefd87248dae47bb.jpg?v=637683681866900000', 'https://compracerta.vtexassets.com/arquivos/ids/539288-266-266/image-c4b1776f2c6249bb9c951e60a1f7d251.jpg?v=637677626053400000', 'https://compracerta.vtexassets.com/arquivos/ids/457660-266-266/image-1225eff99f1242338f3c5fe6daede49b.jpg?v=637504852415570000', 'https://compracerta.vtexassets.com/arquivos/ids/707178-266-266/image-fe45c397563b4a63b9e87bbc3903b4e6.jpg?v=638094758268370000', 'https://compracerta.vtexassets.com/arquivos/ids/457165-266-266/image-b7293d45b28447988dbf245afdf93255.jpg?v=637498804044400000', 'https://compracerta.vtexassets.com/arquivos/ids/707183-266-266/image-c448c36dead64593ae984592f965b969.jpg?v=638094758658100000', 'https://compracerta.vtexassets.com/arquivos/ids/657901-266-266/image-fb1a91a8087b49bdafcf63627aed126c.jpg?v=638007545659400000', 'https://compracerta.vtexassets.com/arquivos/ids/626628-266-266/image-9a3663c83d6548f6aa38e870639785c5.jpg?v=637915917908930000', 'https://compracerta.vtexassets.com/arquivos/ids/624906-266-266/image-0769cd2d559742e3a84adf5fe499279a.jpg?v=637903787294170000']
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352