How to get the html index (counting in units of each html code string character) of a node?

Question

question

ex

you have an html code

<div id="main_parag">Sample 789<!-- comment 23 -->29<script>let i = 47;</script>59<strong>69<span id="AA">Get_My_Index</span></strong></div>

how to get the html index of #AA relative to (starting from) #main_parag?

  document.body.innerHTML = '<div id="main_parag">Sample 789<!-- comment 23 -->29<script>let i = 47;</script>59<strong>69<span id="AA">Get_My_Index</span></strong></div>';
  let elt_outer = $('#main_parag')[0];
  let elt_inner = $('#AA')[0];
  let indHtml = get_IndHtml_of_eltInner_in_eltOuter(elt_inner, elt_outer); // expect 71
  
  // Array.from(element.parentNode.children).indexOf(element) 
  // ^ this is not what I wanted, I want html index, not node index; 
  // plus that `AA` is not a direct child, but nested

comments

(the element id may not be available in some cases)
(I dont think Regex find is safe -- when there are multiple same strings?)

@SebastianSimon My point was not to get the element, I want to get the index. It can be used for, eg: comparing the node position inside a document. — Nor.Z, Feb 12 '23 at 22:19

score 1 · Answer 1 · answered Feb 12 '23 at 22:44

1

Walk over every element with a TreeWalker object, start counting, and stop whenever you find your element.

const root = document.querySelector('#main_parag');
const target = document.querySelector('#AA');

function getIndexOfElement(root, target) {
  const treeWalker = document.createTreeWalker(
    root,
    NodeFilter.SHOW_ELEMENT,
  );

  let index = 0;
  let currentNode = treeWalker.currentNode;

  while (currentNode && currentNode !== target) {
    currentNode = treeWalker.nextNode();
    index++;
  }

  return index;
}

const index = getIndexOfElement(root, target);
console.log(index);

<div id="main_parag">Sample 789
  <!-- comment 23 -->29
  <script>
    let i = 47;
  </script>

  59<strong>69<span id="AA">Get_My_Index</span></strong>
</div>

answered Feb 12 '23 at 22:44

Emiel Zuurbier

19,095
3
17
32

I know `TreeWalk` may be helpful for nested case, but as I said above, `I want html index, not node index`. – Nor.Z Feb 12 '23 at 23:58
What does the HTML index mean? That is not a term I've heard anybody use before. – Emiel Zuurbier Feb 13 '23 at 09:03
-`html index` stands for: the index of a character inside the html code -- thats what I mean by `(counting the html code string)`; - I gave an example above, with the expected index, you may understand what I mean from that. - I dont know if such term every exist, so I named it in that way. – Nor.Z Feb 13 '23 at 10:01
Okay, so just the position of `#AA` within the string? Would using [indexOf](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/indexOf) on your string help you get that result? – Emiel Zuurbier Feb 13 '23 at 14:56
@EmielZuubrier - Because a Regex find fails to work if there are multiple same strings (as I mentioned in the post). - (I can kinda fix that (if they just repeat as Node.ELEMENT_NODE), but that is complicated when "html Node.COMMENT_NODE" are involved). – Nor.Z Feb 13 '23 at 23:09
`indexOf` doesn't require a Regex. It looks for the first occurence and returns the index of that. What would be the preferred outcome with multiple or no occurences anyway? – Emiel Zuurbier Feb 13 '23 at 23:19
I dont understand why that not matters. If just using Regex find (or non regex, doesnt matter), say 2 elements `AA`,`BB` with same `outerHtml` in 2 different position, you want to find the index of `BB`, but it returns index of `AA` . Its just wrong. – Nor.Z Feb 13 '23 at 23:49

score 0 · Answer 2 · answered Feb 12 '23 at 22:59

0

So you could use just a more suitable selector, e.g.

const span = document.querySelector('#main_parag>strong>span[id="AA"]');

The above is just theoretical - I do have no environment to test it. But I created it using https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors

answered Feb 12 '23 at 22:59

Queeg

7,748
1
16
42

The structure provided above is just an example, it can be different in other cases. As I said above `the element id may not be available in some cases` – Nor.Z Feb 13 '23 at 00:00
Then build a selector avoiding the id attribute, such as `#main_parag>strong>span`. This assumes then that the span element is the only one in that hierarchy. – Queeg Feb 13 '23 at 12:04
The point is not about how to pick the element with a selector, the point is about the index of that element... – Nor.Z Feb 13 '23 at 23:12

score 0 · Answer 3 · answered Feb 15 '23 at 13:08

solution

@logic:: add a node as an unique indicator before the node you want to search on, then just use regex to find (remember to remove that node once done)

@code::

    class RegexUtil {
      // https://stackoverflow.com/questions/3561493/is-there-a-regexp-escape-function-in-javascript
      // https://stackoverflow.com/questions/3446170/escape-string-for-use-in-javascript-regex
      /**
       * @param {String} literal_string
       * @returns {String}
       */
      static escapeRegex(literal_string) {
        return literal_string.replace(/[-[\]{}()*+!<=:?.\/\\^$|#\s,]/g, '\\$&');
      }

      /**
       * @param {String} string
       * @returns {String}
       */
      static escapeRegexReplacement(string) {
        return string.replace(/\$/g, '$$$$');
      }
    }

    /**
     * @param {Node} node_inner 
     * @param {Element} elt_outer 
     * @returns {Number} return -1 means no match -- node_inner is not inside elt_outer
     */
    function get_IndHtml_of_nodeInner_in_eltOuter(node_inner, elt_outer) {
      if (node_inner === undefined || elt_outer === undefined) {
        throw new ReferenceError();
      }
      if (!(elt_outer.nodeType === Node.ELEMENT_NODE)) {
        throw new TypeError();
      }

      const content_SearchOn = elt_outer.innerHTML;

      // decide the delimRegexBf
      let html_delimRegexBf;
      let time_now;
      let delimRegexBf_tagName = 'delim-regexbf';
      /** @type {IterableIterator<RegExpMatchArray>} */ let itr;
      let i = 0;
      do {
        i++;
        if (i === 50) {
          throw new Error('Many loops tried, Unable to insert regex_brute_force_delimiter as hardcode_string_indicator in searching content. (The chance of this happening is nearly impossible.)');
        }
        time_now = Date.now();
        html_delimRegexBf = '<' + delimRegexBf_tagName + '>' + time_now + '</' + delimRegexBf_tagName + '>';
        itr = content_SearchOn.matchAll(new RegExp(RegexUtil.escapeRegex(html_delimRegexBf), 'g'));
      } while (itr.next().done !== true);

      //
      const elt_delimRegexBf = document.createElement(delimRegexBf_tagName);
      elt_delimRegexBf.innerHTML = time_now;

      //
      node_inner.parentNode.insertBefore(elt_delimRegexBf, node_inner);
      const content_SearchOn_withDelim = elt_outer.innerHTML;
      //
      elt_delimRegexBf.remove();

      //
      itr = content_SearchOn_withDelim.matchAll(new RegExp(RegexUtil.escapeRegex(html_delimRegexBf), 'gd'));
      const itrEntry_first = itr.next();
      if (itrEntry_first.done === true) {
        // no match -- node_inner is not inside elt_outer
        return -1;
      } else {
        const matcher_first = itrEntry_first.value;
        return matcher_first.indices[0][0];
      }

    }


    // @test-ex
    let elt_outer = $('<div id="main_parag">Sample<!-- comment <span>Get_My_Index</span> --> for testing <span>Get_My_Index</span>; <script>let i = 47;</script>Works?<span>A question<span>Get_My_Index</span></span></div>')[0];
    const treeWalker = document.createTreeWalker(elt_outer, NodeFilter.SHOW_ALL);
    /** @type {Node} */ let node_curr;
    while ((node_curr = treeWalker.nextNode())) {
      const nt = node_curr.nodeType;
      if (nt === Node.TEXT_NODE) {
        console.log(node_curr.textContent);
      } else if (nt === Node.ELEMENT_NODE) {
        console.log(node_curr.outerHTML);
      } else {
        console.log(node_curr.nodeValue);
      }
      console.log(get_IndHtml_of_nodeInner_in_eltOuter(node_curr, elt_outer));
    }
    console.log(elt_outer.innerHTML);

    // @test-ex
    elt_outer = $('<div id="main_parag">nothing to search</div>')[0];
    node_curr = $('<span id="non_exist">?</span>')[0];
    console.log(get_IndHtml_of_nodeInner_in_eltOuter(node_curr, elt_outer));

output

// Sample
// 0
//  comment <span>Get_My_Index</span> 
// 6
//  for testing 
// 48
// <span>Get_My_Index</span>
// 61
// Get_My_Index
// 67
// ; 
// 86
// <script>let i = 47;</script>
// 88
// let i = 47;
// 96
// Works?
// 116
// <span>A question<span>Get_My_Index</span></span>
// 122
// A question
// 128
// <span>Get_My_Index</span>
// 138
// Get_My_Index
// 144
// Sample<!-- comment <span>Get_My_Index</span> --> for testing <span>Get_My_Index</span>; <script>let i = 47;</script>Works?<span>A question<span>Get_My_Index</span></span>
// -1

comment (not important)

turns out.. regex is the simplest way to do it, brute forcing an unique delimiter indicator in the search content is pretty helpful...
I was trying to find the html string representation for all the node types, & while use Node.prevSibling / Node.parentNode to go through all...
(jsfiddle / code snippet somehow doesnt work with my code, but my code in Vscode is functioning fine, dont know if there is some issue with Node.js & browser...)

How to get the html index (counting in units of each html code string character) of a node?

question

ex

comments

3 Answers3

solution

comment (not important)