0

Theres a webnovel site, noveltop (.net) and on every chapter page (of a webnovel) is a select drop down which allows you to pick the chapter to jump to.

Using selenium/python with firefox driver (or chrome) I've dumped the page source and all it shows in the html is:

<div class="c-selectpicker selectpicker_chapter chapter-selection chapters_selectbox_holder" data-chapter="chapter-892-892-bet-limit-shocking-change" data-
manga="1189459" data-type="content" data-vol="0">
</div>

So, obviously it's not being loaded/run. I have tried various solutions to try and wait for the page to load fully including...

  1. WebDriverWait(self.selenium_driver,10).until(EC.presence_of_element_located((By.XPATH, '//body')))
  2.     while True:
         page_state = self.selenium_driver.execute_script('return document.readyState;')
         print("wait4js: page state is:", page_state)
         if page_state == "complete":
             break
    

3.self.selenium_driver.implicitly_wait(2)

  1. NEW EDIT: I've also waited for the elements presence to be found, both by xpath/class and also on it's attributes, also for expected condition to the select to be clickable. The dynamic js doesn't seem to kick in , i've tried both the chrome and firefox drivers.

I can't find the elements I need to gather the options. Obviosuly its loading them in at run time and adding to the div the select and all the options.

It should look like this:

<div class="c-selectpicker selectpicker_chapter chapter-selection chapters_selectbox_holder" data-manga="1248315" data-chapter="chapter-1-invincible-after-a-hundred-years-of-seclusion" data-vol="0" data-type="content">          <label>
                                <select class="c-selectpicker selectpicker_chapter selectpicker single-chapter-select" style="" for="volume-id-0">
                                                                    <option class="short " data-limit="40" value="chapter-1-invincible-after-a-hundred-years-of-seclusion" data-redirect="https://noveltop.net/novel/i-stayed-at-home-for-a-century-when-i-emerged-i-was-invincible/chapter-460-460-conflicts-and-chaos-part-2/">Chapter 460  -  460 Conflicts And Chaos (Part 2)</option>

Can someone teach me how to figure this out so that I can use driver.find_elements to gather all the option elements.

Is it an iframe, do I need to click on the div, run a javascript attached to an html element ? Help.... Deeply frustrated with this code weirdness!

Thank you in advance if you can help me. New to selenium so please be kind.

Jonn Doe
  • 31
  • 1
  • 1
  • 6
  • What's the URL? – MendelG Jan 01 '23 at 18:08
  • An example would be found on: https://noveltop.net/novel/i-stayed-at-home-for-a-century-when-i-emerged-i-was-invincible/chapter-460-460-conflicts-and-chaos-part-2/ – Jonn Doe Jan 01 '23 at 19:41
  • As I'm new to this, does selenium run the JS when the page is loaded ? I.e what triggers the JS to dynamic populate the empty div , I'm lost. – Jonn Doe Jan 01 '23 at 19:48

1 Answers1

0

You were close in your p.1, but instead of waiting for the whole page - wait for your desired element, i.e.:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec

driver = webdriver.Chrome()
driver.maximize_window()
driver.get("https://noveltop.net/novel/i-stayed-at-home-for-a-century-when-i-emerged-i-was-invincible/chapter-460-460-conflicts-and-chaos-part-2/")
options = WebDriverWait(driver, timeout=10).until(ec.presence_of_element_located((By.XPATH, "/html/body/div[1]/div/div/div/div/div/div/div[1]/div/div[1]/div[1]/div/div[2]/div/label/select")))
print(options.get_attribute("innerHTML"))

btw, anytime to check if your element is available AT ALL, you may use delay with time.sleep(seconds) - but don't use it in real code, only for research

Update on disabling site to recognize automated scripts

chrome_options = webdriver.ChromeOptions()
# we should pretend to be a human
chrome_options.add_argument('start-maximized')
chrome_options.add_argument('--disable-web-security')
chrome_options.add_argument('--allow-running-insecure-content')
# personalize chrome profile
chrome_options.add_argument(options['user_data_dir'])
chrome_options.add_argument(options['chrome_profile'])
chrome_options.add_argument('--enable-sync')
# turn off recognition of automation by browser
chrome_options.add_argument('--disable-extensions')
chrome_options.add_experimental_option('useAutomationExtension', False)
chrome_options.add_experimental_option('excludeSwitches', ['enable-automation'])
kadis
  • 181
  • 6
  • Just found out it's a cloudflarew protected site. p.s I have also TRIED webdriver wait for the element with presence of element located and also visibility of all elements located usinmg by XPATH and a selector of '//option[@value and contains(@value, "chapter")]' I think the culprit is either cloudlfare of dynamic js not loading because of cloudlfare – Jonn Doe Jan 01 '23 at 20:51
  • with the code above I could get all the options. then it's not a coding problem – kadis Jan 01 '23 at 20:54
  • how do you connect to webdriver? try to create chrome session under specific chrome profile. sometimes being personalized helps, as well some additional chrome options. I've updated my answer with snippet I've used to avoid captcha, try if any of it could help – kadis Jan 01 '23 at 21:07
  • and maybe this could help you too https://stackoverflow.com/questions/71518406/how-to-bypass-cloudflare-browser-checking-selenium-python – kadis Jan 01 '23 at 21:17
  • Thanks kadis, will try your suggestion – Jonn Doe Jan 02 '23 at 05:02
  • No luck, tried solution, still no values in the find_elements call. – Jonn Doe Jan 02 '23 at 05:35