0

I cannot access the number in and the title in using xpath for some reason.

This is the HTML:

<div class="style-scope classification-tree">
            <state-modifier class="code style-scope classification-tree" act="{&quot;type&quot;: &quot;QUERY_ADD_CPC&quot;, &quot;cpc&quot;: &quot;$cpc&quot;}" first="true" data-cpc="C07C311/51">
                  <a id="link" href="/?q=C07C311%2f51" class="style-scope state-modifier">C07C311/51</a>
            </state-modifier>
            <span class="description style-scope classification-tree">Y being a hydrogen or a carbon atom</span>
          </div>

I've tried this code so far:

Class_Content_year = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, "//state-modifier[@class='code style-scope classification-tree']//a[contains(@id, 'link') and contains(@class, 'style-scope state-modifier')]"))).text

Class_Content_title = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, "//div[@class='style-scope classification-tree']//span[contains(@class, 'description style-scope classification-tree')]"))).text

It's supposed to get the text inside and .

However, this error occurs:

Traceback (most recent call last):
  File "<ipython-input-2-dfe4f1a9b070>", line 97, in openURL
    Class_Content = Class(driver, Current_Content)
  File "c:\Users\jyg\Desktop\MT\Extract_data_2.py", line 57, in Class
    Class_Content_year = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, "//div[@class='style-scope classification-tree']//state-modifier[contains(@class, 'code style-scope classification-tree']/child::a[contains(@id, 'link') and contains(@class, 'style-scope state-modifier')]"))).text
  File "C:\Users\jyg\AppData\Local\Programs\Python\Python37-32\lib\site-packages\selenium\webdriver\support\wait.py", line 80, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:

Could someone please help? Thank you!

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
Jen G
  • 31
  • 4
  • can you try this: `Class_Content_year = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.ID, "link"))).text Class_Content_title = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, "//span[@class='description style-scope classification-tree')]"))).text` – Appu Mistri Jul 22 '19 at 13:30
  • If you see the error it is throwing error on different `xpath` not the `xpath` you have posted. – KunduK Jul 22 '19 at 13:34
  • You are combining WebDriverWait & element.text together. WebDriverWait returns nothing. May be it is a good idea to try these as seperate steps. – Sureshmani Kalirajan Jul 22 '19 at 13:34
  • The id='link' is in several parts on the page, thus solely looking for that will not be sufficient. I tried to incorporate two attributes, but it won't work. – Jen G Jul 22 '19 at 15:25

2 Answers2

0

Here is the xpath to use.

Code:

Class_Content_year = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, "//div[@class='style-scope classification-tree']//state-modifier[contains(@class, 'code style-scope classification-tree')]//a[contains(@id, 'link') and contains(@class, 'style-scope state-modifier')]")))
 # now get the text
 print(Class_Content_year)
 # now get the text from span
 print(driver.find_element_by_xpath("//div[@class='style-scope classification-tree']//span[@class='description style-scope classification-tree']").text)

Here are the other possible xpaths:

//div[@class='style-scope classification-tree']//a[@class='style-scope state-modifier']

For span you can use the below xpath.

//div[@class='style-scope classification-tree']//span[@class='description style-scope classification-tree']
supputuri
  • 13,644
  • 2
  • 21
  • 39
0

To extract the text C07C311/51 instead of using presence_of_element_located() you need to use visibility_of_element_located() and you can use either of the following Locator Strategy:

  • Using XPATH:

    driver.get("https://patents.google.com/patent/JP2009517369A/en?oq=JP2009517369]")
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='style-scope classification-tree' and not(@hidden)]/state-modifier[@class='code style-scope classification-tree']/a[@class='style-scope state-modifier']"))).get_attribute("innerHTML"))
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • It unfortunately did not work. I am afraid it has something to do with the element to be dynamic. See https://patents.google.com/patent/JP2009517369A/en?oq=JP2009517369 under "Classification". I try to get the code number here. – Jen G Jul 22 '19 at 15:22
  • @JenG Checkout the updated answer and let me know the status. – undetected Selenium Jul 22 '19 at 16:25
  • Thank you for the edit! However, it still goes into a timeoutException .. I figured it had something to do with the hidden elements but it seems it still cannot find it. – Jen G Jul 24 '19 at 08:24