3

Problem Description

I am trying to use xpath to locate the "Messi" node from the HTML below. To minimize coding efforts, I am hoping for a solution that uses an array index, instead of looping through an iterator.

My assumption is that the most standard and simplest API is XPathExpression.evaluate(). If there are better APIs, please kindly share.

By the way, I need to make changes to the DOM Node from the returned result. So, XPathResult.resultType will be set to ORDERED_NODE_ITERATOR_TYPE, and therefore XPathResult.snapshotItem() cannot be used.

HTML Example

<html>
<body>

<div>
    <div>NumberOne</div>
    <div>NumberTwo_Mbappe</div>
    <div>NumberOne</div>
    <div>NumberTwo_Ronaldo</div>
    <div>NumberTwo_Messi</div>
</div>

</body>
</html>

Code to get the XPath Results

Running the code below will return an iterator from the above html.

let xpathIterator = new XPathEvaluator()
                        .createExpression("//*[starts-with(text(), 'NumberTwo')]")
                        .evaluate(
                            document, 
                            XPathResult.ORDERED_NODE_ITERATOR_TYPE
                        );

Existing iterator solution for extracting the n-th item

The existing XPathResult interface only has an iterateNext() method, so it will take six lines of code to extract the n-th item:

let n = 3;
while (n > 0) { 
    xpathIterator.iterateNext(); 
    n--; 
}
xpathIterator.iterateNext();

Ideal array solution for extracting the n-th item

Since XPath and Chrome are used by millions of people everyday, ideally, there should be a way to obtain the n-th item directly using an array index (as the following code shows). I would be surprised if such an API doesn't already exist.

let v = xpathResult[2];

The ideal solution doesn't necessarily need to use XPathExpression.evaluate(). I am open to any solutions that use standard JavaScript functions supported by Chrome.

(Hopefully, we don't need to use a function. If a function must be used, it would be good to have no more than 2 to 3 lines of ESLint-linted codes.)

Thanks!

Related Posts

Since XPathResult.resultType is not an iterable, the following posts don't apply:

Emma
  • 316
  • 1
  • 7
  • Does this answer your question? [How to use Array.from with a XPathResult?](https://stackoverflow.com/questions/47017441/how-to-use-array-from-with-a-xpathresult) – Earlee May 10 '23 at 01:07
  • Why would you use XPath when working with HTML pages, instead of just using the normal query selector approach, e.g. ```document.querySelectorAll(`.wikitable tr`)```? No reason to treat the page as XML when it isn't XML? – Mike 'Pomax' Kamermans May 10 '23 at 01:09
  • Hi @Earlee, `Does this answer your question? How to use Array.from with a XPathResult?` Array.from() doesn't work, because it requires an iterable, or an array-like object. Array-like objects need to have the "length" property. – Emma May 10 '23 at 01:30
  • @Emma yes they have a discussion there as to how you could convert it to array by creating your own function. then you can finally array indexing. – Earlee May 10 '23 at 01:32
  • Hi @Mike'Pomax'Kamermans , `Why would you use XPath when working with HTML pages, instead of just using the normal query selector approach, e.g. document.querySelectorAll`, the purpose of my question is to perform the search via XPath, because, for my industry (QA Selenium Development), we often do not have the luxury of using CSS selectors. Sometimes, the frontend develoeprs produced a website with huge tables and no attributes (no class and no id). The only way to search for something is by using xpath to locate the desired texts in the table row. CSS selectors don't work on text nodes. – Emma May 10 '23 at 01:46
  • Hi @Earlee `yes they have a discussion there as to how you could convert it to array by creating your own function. then you can finally array indexing.` They did the conversion with 6 lines of codes, which is no shorter than the "existing" solution that I provided above. The purpose of this question is to help people who need to manually do xpath search thousands of times a day, and cannot afford 6 lines of codes. – Emma May 10 '23 at 01:49
  • 2
    Hi @Mike'Pomax'Kamermans `Why would you use XPath` I have updated my post to perform a text-based XPath search. It will work if you paste the codes into Chrome. Thanks! – Emma May 10 '23 at 01:53
  • Hi @Earlee `yes they have a discussion there as to how you could convert it to array by creating your own function.` I am hoping to avoid the need to create a function, because the codes will be run in the Chrome console, and the Chrome window will be closed and re-opened hundreds of times a day, which means the function will need to be manually re-created hundreds of times a day. – Emma May 10 '23 at 02:01
  • @Emma thanks. Although of course, this is still really easy to do with "not xpath" by using ```Array.from(document.querySelector(`....`)).filter(e => e.textContent.contains(`be`))```, so I'm still not sure xpath makes more sense than query selecting and then mapping/filtering as needed =) It might help to explain what you're actually trying to achieve, concretely, just in case there's some easy "normal" JS that can just as easily achieve that goal. – Mike 'Pomax' Kamermans May 10 '23 at 02:20
  • Hi @Mike'Pomax'Kamermans, thanks for the Array.filter() suggestion. I have added an HTML example. Please review. How would you use CssSelector to get the three "NumberTwo" nodes? After getting the three nodes, how would you access the 3rd node (the "Messi" node) directly? By the way, the five text nodes aren't necessarily located in
    • ; they are equally likely to be wrapped by
      1. or . Also, the "NumberTwo" nodes aren't necessarily the 2nd, 4th, and 5th nodes; they are equally likely to be at the 1-2-5 or 1-4-5 or 3-4-5 positions. Thanks!
    – Emma May 10 '23 at 04:47
  • There are certainly libraries that expose the console functions like `$` or `$x` to "normal" script, e.g. https://github.com/WebReflection/basic-devtools/blob/main/esm/index.js does that. Also, XPath has undergone quite a development from XPath 1 which is all that browser support to XPath 3.1 which is what modern libaries like SaxonJS or FontoXPath support. `SaxonJS.XPath.evaluate` can be used to return an array, for instance. – Martin Honnen May 10 '23 at 09:35
  • Hi @MartinHonnen, thanks for the suggestion. How do we install SaxonJS in the Google Chrome console? Would "npm install SaxonJS" work? – Emma May 10 '23 at 14:42

3 Answers3

1

This will get the third item in your example:

let v = [...new Array(3)].map( () => xpathIterator.iterateNext() ); 
v[2];
Arial
  • 326
  • 2
  • 11
0

inject this into the console:

document.querySelector(".wikitable >  tbody").children[6];
AJ Zack
  • 205
  • 1
  • 8
  • Thanks! The purpose of my question is to perform the search via XPath, because, for my industry (QA Selenium Development), we often do not have the luxury of using CSS selectors. Sometimes, the frontend develoeprs produced a website with huge tables and no attributes (no class and no id). The only way to search for something is by using xpath to locate the desired texts in the table row. CSS selectors don't work on text nodes. – Emma May 10 '23 at 01:57
  • I have updated my post to perform a text-based XPath search. It will work if you paste the codes into Chrome. Thanks! – Emma May 10 '23 at 01:57
0

How would you use CssSelector to get the three "NumberTwo" nodes? After getting the three nodes, how would you access the 3rd node (the "Messi" node) directly? By the way, the five text nodes aren't necessarily located in <ul><li> they are equally likely to be wrapped by <ol><li> or <table><tr>.

Given the HTML you're showing in your edits, like this:

const allNodes = Array.from(document.querySelectorAll(`ul li, ol li, table tr`))
const allNumberTwoNodes = allNodes.filter(e =>
                              e.textContent.includes(`NumberTwo`)
                          );
console.log(allNumberTwoNodes);
<html>
  <body>
    <ul>
      <li>NumberOne</li>
      <li>NumberTwo_Mbappe</li>
      <li>NumberOne</li>
      <li>NumberTwo_Ronaldo</li>
      <li>NumberTwo_Messi</li>
    </ul>

    <ol>
      <li>NumberOne</li>
      <li>NumberTwo_Mbappe</li>
      <li>NumberOne</li>
      <li>NumberTwo_Ronaldo</li>
      <li>NumberTwo_Messi</li>
    </ol>
    
    <table>
      <tr><td>NumberOne</td></tr>
      <tr><td>NumberTwo_Mbappe</td></tr>
      <tr><td>NumberOne</td></tr>
      <tr><td>NumberTwo_Ronaldo</td></tr>
      <tr><td>NumberTwo_Messi</td></tr>
    </table>
  </body>
</html>

Here, we're relying on textContent, which gives us (unsurprisingly) the text content of a node ignoring tags, which is why even though those table rows have table data cells, the <tr>'s textContent gives us a string as if the <td> markup isn't there.

Also, the "NumberTwo" nodes aren't necessarily the 2nd, 4th, and 5th nodes; they are equally likely to be at the 1-2-5 or 1-4-5 or 3-4-5 positions.

Query selectors, just like XPath, doesn't care what order the HTML is in, it's going to find "the things that match", not "the thing at the xth position" (unless you bake child position into the selector, just like XPath).

Mike 'Pomax' Kamermans
  • 49,297
  • 16
  • 112
  • 153
  • Thanks Mike. I have updated my HTML example. How would you use CssSelector to find the three "NumberTwo" in "SectionTwo", skipping the ones in "SectionOne"? Thanks. – Emma May 10 '23 at 15:06
  • By first fixing that HTML, because `
      ` only allows `
    • ` as children, not text. Also, can you please edit your post to just show examples of real data you're working with? The html you're showing has no classes, even though the page you were asking about does: using query selectors is one of the backbones of modern web work so asking "how do I select X from HTML Y" is almost certainly met with "by just looking up how to query-select for that" with millions of posts on SO already and tons of tutorials on the web. So your post should be about your _specific_ problem, not only sort of =)
    – Mike 'Pomax' Kamermans May 10 '23 at 15:08
  • Fixed with
    's.
    – Emma May 10 '23 at 15:11
  • Hi Mike, I was using real data. For internal company websites, the web developers don't care about class's or id's or attributes. They try to deliver a working website within the shortest amount of time. (Ask any Selenium developer how often they encountered this kind of website.) We cannot ask the web developers to change the websites, because they don't have time. – Emma May 10 '23 at 15:18
  • Hi Mike, even for multi-billion production websites, there are frequently elements that can only be distinguished by text only. Try this website: https://www.quora.com/What-is-the-average-cost-of-a-laptop . Quora added the new "Sage" node recently at the top. How would you use CSS to locate that? Sure, you can get all the nodes, and then Array.filter(). But what if I am building my XPath (or Css) in the "Element" tab, and I don't want to switch back and forth between "Element" and "Console"? Can we run JavaScript in the "Element" tab? – Emma May 10 '23 at 15:25
  • Trying to simplify the Question's HTML example by reverting it... if you need the original HTML, here you go: `
    SectionOne
    NumberOne
    NumberTwo_MbappeSectionOne
    NumberOne
    NumberTwo_RonaldoSectionOne
    NumberTwo_MessiSectionOne
    SectionTwo
    NumberOne
    NumberTwo_MbappeSectionTwo
    NumberOne
    NumberTwo_RonaldoSectionTwo
    NumberTwo_MessiSectionTwo
    `
    – Emma May 10 '23 at 15:52
  • Please do not use comment threads for post details: just put those details in your post. Because you're not really talking to "me" when you supply details, you're talking to everyone who can help answer your post, and they won't see what you said if you say it in comments. (Remember the [posting guidelines](/help/how-to-ask)). – Mike 'Pomax' Kamermans May 10 '23 at 16:20