0

In my program I sometimes need to scrape text of invisible/hidden web element. I'm aware WebDriver normally returns visible text and that one could scrape hidden/invisible text via one of the following methods (as suggested in this and this thread on SO):

JavascriptExecutor js = (JavascriptExecutor) driver; 
scrapedText = js.executeScript ("return arguments[0].innerHTML", webElement).toString();

Or by calling:

element.attribute('textContent')

element.attribute('innerText')

element.attribute('innerHTML')

While both of these solutions work, they will retrieve text that is not only invisible but is also normally not identified via getText() method. For example in the following HTML:

<div class="a-section a-spacing-none">
<a id="brand" class="a-link-normal" href="/abc-d/b/ref=w_bl_sl_l_ap_ap_web_258XXX11?ie=UTF8&node=258XXX11&field-lbr_brands_browse-bin=abc+d">
<img id="brand" src="https://images-na.ssl-images-amazon.com/images/G/01/x-locale/brands/byline-logo/25xxx11._CB520xxx1_SR120,50_.jpg" alt=""/>
</a>
</div> 

textContent, innerText or innerHTML will all return <img element even though I'm trying to identify 'href' attribute (using XPath '//a[contains(@href, 'brands_browse-bin')]' )

In other words, I'm trying to create a generic solution where my program will always identify invisible/hidden elements without identifying additional elements like it does when using textContent, innerText or innerHTML (basically I want the same result as when calling getText() with the only exception that it includes hidden elements)

Is this possible?

Thanks

Update:

If you navigate to: https://www.amazon.com/dp/B01H4LBIVC and try to scrape 'price' (via .//*[@id='priceblock_ourprice'] for example) it will not work since the element is not visible (I'm aware I could make it visible by clicking 'One-time Purchase'). If I decided to retrieve element via one of the methods listed above - I would be able to retrieve the price but it would also retrieve the wrong value in the HTML sample provided above. If there's a method that identifies hidden elements (similar to getText() ) but does not automatically include "innerHTML" etc this issue would not be present. In short, I need a generic solution that will identify 'price' (which is hidden in above example) and also identify the correct element in the HTML snippet above.

S.O.S
  • 848
  • 10
  • 30
  • Hi @S.O.S, can you try this code? `JavascriptExecutor js = (JavascriptExecutor) driver; WebElement element = driver.findElement(By.id("brand")); String scrapedText = js.executeScript ("return arguments[0].href;", element).toString();` – Ali Mar 19 '19 at 16:39
  • @AliCSE Thanks for your reply. This works, however I'm trying to create a generic solution where my program 1) Identifies both visible and non-visible elements 2) Solution works regardless of the specific HTML attribute. While the said solution works if attribute is 'href' it won't work when attribute changes. Basically, I want to identify exactly the same elements as with getText() but with added option of identifying hidden elements. Updating question w some additional details.. – S.O.S Mar 19 '19 at 16:50

1 Answers1

0

In the example you gave of retrieving the price from the Amazon product, the three options will all return the same value because there is nothing inside the element except text.

<span id="priceblock_ourprice" class="a-size-medium a-color-price">$26.99</span>

The difference between those three options comes when there is formatting or other HTML elements inside. For example, if you use .innerHTML on the made up example HTML below

<span id="priceblock_ourprice" class="a-size-medium a-color-price"><strong>$26.99</strong></span>

It would return <strong>$26.99</strong>, instead of just $26.99.

The simplest option (and the one you seem to want) is to always use .textContent. It will only return the contained text (never HTML tags, etc). At that point, it's up to you to properly provide a locator to find the element that contains the text you want.

There's a more in-depth explanation of the difference between the three (and others not mentioned) if you want more details in this answer.

JeffC
  • 22,180
  • 5
  • 32
  • 55