-1

It works fine when I do it on my window desktop, but it doesn't work when I do it with AWS LAMBDA. I know the code for crawling website by using selenium. It works fine on other websites but it doesn't work only on certain websites as below.

  1. https://despread-creative.notion.site/6f7b61a2f09b41488d63492c665aadf4?v=1f42aaf6a4d546839700383df006b862
  2. http://korbit.co.kr/market/research
  3. https://xangle.io/insight/research
driver.get('http://korbit.co.kr/market/research')
time.sleep(5)
driver.find_element('xpath','//*[@id="list"]/div/div[1]/a/h3').text    

above simple code didn't works...

with msg: unable to lacate elements ,,,

My lambda layer for selenium with this command.

1. pip3.7 install -t selenium/python/lib/python3.7/site-packages selenium==3.8.0
2. curl -SL https://chromedriver.storage.googleapis.com/2.37/chromedriver_linux64.zip > chromedriver.zip
3. curl -SL https://github.com/adieuadieu/serverless-chrome/releases/download/v1.0.0-41/stable-headless-chromium-amazonlinux-2017-03.zip > headless-chromium.zip

pls help.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
  • Have you tried increasing the sleep period before looking for the element? When I tried to hit the website, it took me longer than usual to get to the page. I'm assuming there's a delay because of latency between regions from where the requests are coming from. In which region is your Lambda deployed? – Pv66 Feb 17 '23 at 12:47
  • i did. but it doesn't works . i tried seting sleep time period 60secs. my region eastasia2(seoul) – ddabongcoin Feb 17 '23 at 13:02
  • Which element is `//*[@id="list"]/div/div[1]/a/h3`? – undetected Selenium Feb 17 '23 at 23:23
  • yes in the "korbit" crawling case. is it work for your lambda? – ddabongcoin Feb 18 '23 at 19:59

1 Answers1

0

To extract the text STO 시리즈 1: 블록체인과 시장 활성화 from the website as the element is a dynamic element, ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:

  • Using CSS_SELECTOR:

    driver.get('https://korbit.co.kr/market/research')
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "a[href*='/market/research']>img +h3"))).text)
    driver.quit()
    
  • Using XPATH:

    driver.get('https://korbit.co.kr/market/research')
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//a[starts-with(@href, '/market/research')]/img//following-sibling::h3[1]"))).text)
    driver.quit()
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • i tried with your recommendation, but still no works with error Msg 'TimeoutException'. i edit wait time to 60 or more, but still same. it might the above several websites can't be opened in my aws-lambda,, help pls – ddabongcoin Feb 19 '23 at 13:21