The main issue you had is an indentation problem that was just running your script using the last object found on the page.
Another issue I saw is that you were just putting all the titiles together, all the old prices together and so on.
For this reason it will be difficult to understand which price belongs to which item in case, for example, of items with missing data.
To solve this issue I've put all the items in a single webpage into the variable "products".
About the "append" or "write" option of the CSV in my implementation I check as first thing if the result.csv file exists.
Then we have two cases:
- result.csv doesn't exist: I create it and I put headers in
- result.csv already exists: it means that header is already in place and I can simply append new rows when looping
In order to get data out easily I've used BeautifulSoup (install it easily with pip).
There are still several challenges ahead because the data in this webpage is not consistent but the following example should be enough to get you going.
Please keep in mind that the "break" in the code will stop the scraping at the 1st page.
import csv
# YouTube Video: https://www.youtube.com/watch?v=zjo9yFHoUl8
from selenium import webdriver
from bs4 import BeautifulSoup
import os.path
MAX_PAGE_NUM = 67
MAX_PAGE_DIG = 1
driver = webdriver.Chrome('/Users/reezalaq/PycharmProjects/untitled2/venv/driver/chromedriver')
#driver = webdriver.Chrome()
def write_csv_header():
with open('result.csv', 'w') as f:
f.write("Product Name, Sale Price, Discount, Old Price \n")
def write_csv_row(product_title, product_new_price, product_discount, product_old_price, product_link):
with open('result.csv', 'a') as f:
f.write(product_title + ' , ' + product_new_price + ' , ' + product_discount + ' , ' + product_old_price + ' , ' + product_link + '\n')
if os.path.isfile('result.csv'):
write_csv_header()
for i in range(1, MAX_PAGE_NUM + 1):
page_num = (MAX_PAGE_DIG - len(str(i))) * "0" + str(i)
url = "https://www.blibli.com/jual/batik-pria?s=batik+pria&c=BA-1000013&i=" + page_num
driver.get(url)
source = driver.page_source
soup = BeautifulSoup(source, 'html.parser')
products = soup.findAll("a", {"class": "single-product"})
for product in products:
try:
product_title = product.find("div", {"class": "product-title"}).text.strip()
except:
product_title = "Not available"
try:
product_new_price = product.find("span", {"class": "new-price-text"}).text.strip()
except:
product_new_price = "Not available"
try:
product_old_price = product.find("span", {"class": "old-price-text"}).text.strip()
except:
product_old_price = "Not available"
try:
product_discount = product.find("div", {"class": "discount"}).text.strip()
except:
product_discount = "Not available"
try:
product_link = product['href']
except:
product_link = "Not available"
write_csv_row(product_title, product_new_price, product_discount, product_old_price, product_link)
break # this stops the parsing at the 1st page. I think it is a good idea to check data and fix all discrepancies before proceeding
driver.close()