In my program I sometimes need to scrape text of invisible/hidden web element. I'm aware WebDriver
normally returns visible text and that one could scrape hidden/invisible text via one of the following methods (as suggested in this and this thread on SO):
JavascriptExecutor js = (JavascriptExecutor) driver;
scrapedText = js.executeScript ("return arguments[0].innerHTML", webElement).toString();
Or by calling:
element.attribute('textContent')
element.attribute('innerText')
element.attribute('innerHTML')
While both of these solutions work, they will retrieve text that is not only invisible but is also normally not identified via getText()
method. For example in the following HTML:
<div class="a-section a-spacing-none">
<a id="brand" class="a-link-normal" href="/abc-d/b/ref=w_bl_sl_l_ap_ap_web_258XXX11?ie=UTF8&node=258XXX11&field-lbr_brands_browse-bin=abc+d">
<img id="brand" src="https://images-na.ssl-images-amazon.com/images/G/01/x-locale/brands/byline-logo/25xxx11._CB520xxx1_SR120,50_.jpg" alt=""/>
</a>
</div>
textContent, innerText or innerHTML
will all return <img
element even though I'm trying to identify 'href'
attribute (using XPath
'//a[contains(@href, 'brands_browse-bin')]
' )
In other words, I'm trying to create a generic solution where my program will always identify invisible/hidden elements without identifying additional elements like it does when using textContent, innerText or innerHTML
(basically I want the same result as when calling getText()
with the only exception that it includes hidden elements)
Is this possible?
Thanks
Update:
If you navigate to: https://www.amazon.com/dp/B01H4LBIVC and try to scrape 'price' (via .//*[@id='priceblock_ourprice']
for example) it will not work since the element is not visible (I'm aware I could make it visible by clicking 'One-time Purchase'). If I decided to retrieve element via one of the methods listed above - I would be able to retrieve the price but it would also retrieve the wrong value in the HTML sample provided above. If there's a method that identifies hidden elements (similar to getText()
) but does not automatically include "innerHTML" etc this issue would not be present. In short, I need a generic solution that will identify 'price' (which is hidden in above example) and also identify the correct element in the HTML snippet above.