Parsing HTML with xpath or cssSelector?

Question

How do I parse for just the text portions of these blocks of code? I am using Selenium client drivers in java.

<li id="NOT_PUT_PREF_STORE" style="">
<span id="STORE_AVAIL" class="BodyLBoldGrey StockStat">Out of stock</span> <span id="InYourLocal">in your local</span> <span id="storeRollover_2"><span id="STORE_CITY" class="BodyLBoldLtgry VIBSStore1">West Hills</span></span> store<span id="notSelectOptionSOI">.</span>
</li>

or

<li id="NOT_PUT_PREF_STORE" style="">
<span id="STORE_AVAIL" class="BodyLLtgry StockStat">Not carried</span> <span class="BodyLLtgry" id="InYourLocal">in your local</span> <span id="storeRollover_2"><span id="STORE_CITY" class="BodyLBoldLtgry VIBSStore1">West Hills</span></span> store<span id="notSelectOptionSOI">.</span>
</li>

or

<li id="NOT_PUT_PREF_STORE" style="">
<span id="STORE_AVAIL" class="BodyMBold StockStatGreen">In stock</span> <span id="InYourLocal">in your local</span> <span id="storeRollover_2"><span id="STORE_CITY" class="BodyLBoldLtgry VIBSStore1">West Hills</span></span> store<span id="notSelectOptionSOI">.</span>
</li>

I am trying to parse for the text portion in each of these variations in the webelement (ie: Not carried, In stock, Out of stock). I am a very new user to selenium and html parsing so this is really hard for me to get functional.

I was thinking that it would be something like

WebElement driver = new FirefoxDriver(profile);
driver.get(Url);
System.out.println(driver.getElement(By.id("STORE_AVAIL").getText());

Not sure how I would do it with cssSelector but people tell me that is faster. Would this work?

driver.getElement(By.xpath("//li[@id='NOT_PUT_PREF_STORE']./span[@id='STORE_AVAIL']").getText()

_"The string that I am looking for isnt actually stored in the page source."_ So how does it get displayed? JavaScript? _"but STORE_AVAIL is actually in the page source"_ I think you just contradicted yourself, but it's not clear. — Matt Ball, Apr 25 '12 at 23:58
BTW what the code above does? To me, it should print out what you need... — Pavel Janicek, Apr 26 '12 at 05:26

score 0 · Answer 1 · edited May 23 '17 at 10:24

When you 'View Page Source' it will only show the original HTML source. It will not show changes made by AJAX calls, which looks like how the Walmart page is updating that section/element. This question provides a better explanation.

Assuming you are using Firefox (based on the driver you are using), you can go to the page and click Ctrl+Shift+I to bring up the Inspector tool. Select the element you are interested in. Then click the [HTML] button (in the Inspector menu) to view the current source.

Note that when you are getting the element using selenium webdriver, it will be getting the current value rather than the original value seen in the page source. So you do not have to worry about what you see in the page source.

Oh, I didn't know thats how it worked. Thanks, I'll try it again now that I know that. — AlbChu, Apr 26 '12 at 04:44

Isaac · Accepted Answer · 2012-04-27T14:32:43.857

When I try to find elements on the page I always build my locators by:

id = driver.getElement(By.id("STORE_AVAIL").getText());
css selector = driver.getElement(By.css("span#STORE_AVAIL").getText());
xpath = driver.getElement(By.xpath("//span[@id='STORE_AVAIL']").getText());

The id seems to be the fastest and easiest, both for webdriver and for me. id should be unique on the page.

CSS take a little more investigative work on my part, but webdriver handles it just fine.

Lastly, xpath is sometimes unavoidable (unless you buy the devs a beer and ask nicely to change to application so you can locate it faster - after all, you are testing for them anyway). Locating by xpath with IE is terribly slow and writing complex xpaths is a drag.

Xpath is also fragile, one small change to the dom can render your xpath unusable. Then you get to debug/rewrite your xpath (it is as fun as it sounds).

My suggestion is to use Firebug and FirePath addons for Firefox to help you craft your locators.

score 0 · Answer 3 · answered Nov 06 '14 at 04:55

I am tried with the following html code snipet

<li id="NOT_PUT_PREF_STORE" style="">
<span id="STORE_AVAIL" class="BodyLBoldGrey StockStat">Out of stock</span> <span id="InYourLocal">in your local</span> <span id="storeRollover_2"><span id="STORE_CITY" class="BodyLBoldLtgry VIBSStore1">West Hills</span></span> store<span id="notSelectOptionSOI">.</span>
</li>

I am using the following code to solve it. I get the tree of span elements using XPath and parse through each of it to get the text of the elements.

driver.navigate().to("file:///C:/Users/abc/Desktop/test.html");
    List<WebElement> spanEle = driver.findElements(By.xpath("//li/span"));
    for (int i = 0; i < spanEle.size(); i++) {
             System.out.println(spanEle.get(i).getText());

Parsing HTML with xpath or cssSelector?

3 Answers3

Linked