-1

I want to scrape text of few fields on the basis of their web elements (xpath, classes etc).

<div class = myOnlyElement>
  <div> ......
    <div class = afafasf> ......</div>
    <div class = klklkl> ......
      <div class = qwqwqwq> ......
        <div class = reaction> text i need</div>
      </div>
    </div>
  </div>
</div>

<div class = myElement>
  <div> ......
    <div class = dfdfdf> ......</div>
    <div class = ghgghghg> ......
      <div class = erererere> ......
        <div class = reaction> text i don't need</div>
      </div>
    </div>
  </div>
</div>

Suppose I have backend of element like this. I find element like:

myelem = driver.find_element_by_classname('myOnlyElement')

Now I only want to pick class "reaction" with text I need. I am doing like:

myelem.find_element_by_classname('reaction')

if this class is present it captures it, but in some cases it goes for class = "reaction" whose text is "text i don't need"

Hope I have clearly mentioned my question. Can you please help me

Abdul Aziz
  • 31
  • 5
  • driver.find_element_by_class_name is the proper syntax. – Arundeep Chohan Sep 23 '20 at 22:02
  • Also just check if you got myelem. Or use waits. – Arundeep Chohan Sep 23 '20 at 22:19
  • you can use find elements to get an array of elements matching the class... then iterate and get InnerHTML: thiselement.get_attribute('innerHTML') to find out if it's text you need or don't need. – pcalkins Sep 23 '20 at 22:19
  • The thing is whenever i visit any link, i want to get information from very first element. if it is present there or not. if element is present, get the text, if element is not present print "no text" – Abdul Aziz Sep 24 '20 at 14:11

2 Answers2

0

my friend, best solution when it comes to this stuff, right click on the webpage, where you see the text. Right click in the DOM inspector and click Copy -> Copy Full XPath value. then you might need to do .text .source to get those values. but try and play around.

Amit Shah
  • 3
  • 4
0

To print the text text i need you can use either of the following Locator Strategies:

  • Using css_selector and get_attribute():

    print(driver.find_element_by_css_selector("div.myOnlyElement div.reaction").get_attribute("innerHTML"))
    
  • Using xpath and text attribute:

    print(driver.find_element_by_xpath("//div[@class='myOnlyElement']//div[@class='reaction']").text)
    

Ideally, to print the text text i need you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR and get_attribute():

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.myOnlyElement div.reaction"))).get_attribute("innerHTML"))
    
  • Using XPATH and text attribute:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='myOnlyElement']//div[@class='reaction']"))).text)
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python


Outro

Link to useful documentation:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352