0

Continuous syntax error and no output in my web scraping program. My xpath is correct as it is pointing to the correct names but I am not getting any output. The website is https://www.ikea.com/sa/en/search/?query=chair&pageNumber=1 . Can someone help?

I have python 3.4.4 and I am using visual studio code as a GUI. I am trying to get the item names from the IKEA website as a web scraping code. But I am continuously having the error. Can someone help?

import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait, Select
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException, StaleElementReferenceException, WebDriverException
import csv
import os

driver= webdriver.Chrome("C:/Python34/Scripts/chromedriver.exe")
driver.get("https://www.ikea.com/sa/en/search/?query=chair&pageNumber=1")
title =driver.findElement(By.XPath("//span[@class='prodName prodNameTro']")).text()
print(title) 

Expected Output:

RENBERGET
HÄRÖ / FEJAN
ÄPPLARÖ
TÄRENDÖ / ADDE
AGAM
ÄPPLARÖ

These are the names of the item on the page

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352

2 Answers2

0

As there are multiple products present on the page, you can store all the values in a list and then print those values.
You can perform it like:

driver= webdriver.Chrome("C:/Python34/Scripts/chromedriver.exe")
driver.get("https://www.ikea.com/sa/en/search/?query=chair&pageNumber=1")
productNames = driver.find_elements_by_xpath("//span[contains(@id,'txtNameProduct')]")
for product in productNames:
    print (product.text)
Sameer Arora
  • 4,439
  • 3
  • 10
  • 20
0

To scrape the page with python you don't need Selenium. Much faster and easier is to use, for example, and
Here is basic example code for search chairs on ikea, you'll get all chairs(723) from all pages in seconds:

import requests
from bs4 import BeautifulSoup

# array of all items
result = []

# request first page of query. "query=chair&pageNumber=1"
response = requests.get('https://www.ikea.com/sa/en/search/?query=chair&pageNumber=1')
# assert for response is OK
assert response.ok

# parse response test using html.parser
page = BeautifulSoup(response.text, "html.parser")

# get last page number and convert to integer.
last_page_number = int(page.select_one(".pagination a:last-child").text)

# iterate throw from 1 to 30 pages
for i in range(1, last_page_number + 1):
    # if i==1 skip request again, because we already get response for the first page
    if i > 1:
        # request using i as parameter
        response = requests.get(f'https://www.ikea.com/sa/en/search/?query=chair&pageNumber={str(i)}')
        assert response.ok
        page = BeautifulSoup(response.text, "html.parser")

    # get all products containers, that contains name, price and description
    products = page.select("#productsTable .parentContainer")

    # iterate throw all products in the page. get name, price and description and add to result as map
    for product in products:
        name = product.select_one(".prodName").text.strip()
        desc = product.select_one(".prodDesc").text.strip()
        price = product.select_one(".prodPrice,.prodNlpTroPrice").text.strip()
        result.append({"name": name, "desc": desc, "price": price})

# print results, you can do anything..
for r in result:
    print(f"name: {r['name']}, price: {r['price']}, description: {r['desc']}")

print("the end")
Sers
  • 12,047
  • 2
  • 12
  • 31
  • Thank you so much sir, you are the best (Y) – Hassan Sultan Mar 30 '19 at 17:53
  • In line **products = page.select("#productsTable .parentContainer")** and **last_page_number = int(page.select_one(".pagination a:last-child").text)**, what does _"#productsTable "_ and _".pagination a:last-child"_ means? – Hassan Sultan Mar 30 '19 at 18:16
  • Open chrome devtools elements tab, press ctrl+f to search and enter `#productsTable .parentContainer`, you'll see products divs. Same for `.pagination a:last-child`, it gets last page number in pagination. – Sers Mar 30 '19 at 18:21
  • I want to learn this amazing code of yours and I hope that you can help me with that: the html is ** div class="parentContainer" ** and ** div id="productsTable" class="productsContainer" ** so how have you merged it as _"#productsTable .parentContainer"_ and like wise for the html code ** div class="pagination" ** and **a:last-child** as _.pagination a:last-child_ . Is there some trick to do that? – Hassan Sultan Apr 01 '19 at 19:55
  • There is no any trick, it's basic css selector. You can check it and make some practice: https://www.w3schools.com/cssref/css_selectors.asp. Also you need learn how to use xpath. This is basics and mast in test or scrape – Sers Apr 01 '19 at 20:21
  • If we want to add the link of the images as well then how the code must be amended? – Hassan Sultan Apr 02 '19 at 19:11
  • find `img` tag and get `src`attribute – Sers Apr 02 '19 at 19:13