1

i'm trying to scrape more than 10 pages of reviews from https://www.innisfree.com/kr/ko/ProductReviewList.do

However when i move to the next page and try to get the new page's reviews, i still get the first page's reviews only.

i used driver.execute_script("goPage(2)") and also time.sleep(5) but my code only gives me the first page's reviews.

''' i did not use for-loop just to see whether the results are different between page1 and page2''' ''' i imported beautifulsoup and selenium'''

here is my code:

  url = "https://www.innisfree.com/kr/ko/ProductReviewList.do"

  chromedriver = r'C:\Users\hhm\Downloads\chromedriver_win32\chromedriver.exe'

  driver = webdriver.Chrome(chromedriver)

  driver.get(url)


  print("this is page 1")

  driver.execute_script("goPage(1)")

  nTypes = soup.select('.reviewList ul .newType div[class^=reviewCon] .reviewConTxt')


  for nType in nTypes:

         product = nType.select_one('.pdtName').text

         print(product)


 print('\n')

 print("this is page 2")

 driver.execute_script("goPage(2)")

 time.sleep(5)

 nTypes = soup.select('.reviewList ul .newType div[class^=reviewCon] .reviewConTxt')


 for nType in nTypes:

         product = nType.select_one('.pdtName').text

         print(product)
Heamin Han
  • 21
  • 1

2 Answers2

0

If your second page open as new window then you need to switch to another page and switch your selenium control to another window

Example:

# Opens a new tab
self.driver.execute_script("window.open()")

# Switch to the newly opened tab
self.driver.switch_to.window(self.driver.window_handles[1])

Source:

How to switch to new window in Selenium for Python?

https://www.techbeamers.com/switch-between-windows-selenium-python/

Shubham Jain
  • 16,610
  • 15
  • 78
  • 125
0

Try the following code.You need to click on each pagination link to reach to next page.you will get all 100 review comments.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import time
url = "https://www.innisfree.com/kr/ko/ProductReviewList.do"
chromedriver = r'C:\Users\hhm\Downloads\chromedriver_win32\chromedriver.exe'
driver = webdriver.Chrome(chromedriver)
driver.get(url)

for i in range(2,12):
   time.sleep(2)
   soup=BeautifulSoup(driver.page_source,'html.parser')
   nTypes = soup.select('.reviewList ul .newType div[class^=reviewCon] .reviewConTxt')
   for nType in nTypes:
      product = nType.select_one('.pdtName').text
      print(product)
   if i==11:
    break
   nextbutton=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//span[@class='num']/a[text()='" +str(i)+"']")))
   driver.execute_script("arguments[0].click();",nextbutton)
KunduK
  • 32,888
  • 5
  • 17
  • 41