-1

question

How to get the html index (counting in units of each html code string character) of a node?

ex

you have an html code

<div id="main_parag">Sample 789<!-- comment 23 -->29<script>let i = 47;</script>59<strong>69<span id="AA">Get_My_Index</span></strong></div>

how to get the html index of #AA relative to (starting from) #main_parag?

  document.body.innerHTML = '<div id="main_parag">Sample 789<!-- comment 23 -->29<script>let i = 47;</script>59<strong>69<span id="AA">Get_My_Index</span></strong></div>';
  let elt_outer = $('#main_parag')[0];
  let elt_inner = $('#AA')[0];
  let indHtml = get_IndHtml_of_eltInner_in_eltOuter(elt_inner, elt_outer); // expect 71
  
  // Array.from(element.parentNode.children).indexOf(element) 
  // ^ this is not what I wanted, I want html index, not node index; 
  // plus that `AA` is not a direct child, but nested

comments

  • (the element id may not be available in some cases)
  • (I dont think Regex find is safe -- when there are multiple same strings?)
Nor.Z
  • 555
  • 1
  • 5
  • 13

3 Answers3

1

Walk over every element with a TreeWalker object, start counting, and stop whenever you find your element.

const root = document.querySelector('#main_parag');
const target = document.querySelector('#AA');

function getIndexOfElement(root, target) {
  const treeWalker = document.createTreeWalker(
    root,
    NodeFilter.SHOW_ELEMENT,
  );

  let index = 0;
  let currentNode = treeWalker.currentNode;

  while (currentNode && currentNode !== target) {
    currentNode = treeWalker.nextNode();
    index++;
  }

  return index;
}

const index = getIndexOfElement(root, target);
console.log(index);
<div id="main_parag">Sample 789
  <!-- comment 23 -->29
  <script>
    let i = 47;
  </script>

  59<strong>69<span id="AA">Get_My_Index</span></strong>
</div>
Emiel Zuurbier
  • 19,095
  • 3
  • 17
  • 32
  • I know `TreeWalk` may be helpful for nested case, but as I said above, `I want html index, not node index`. – Nor.Z Feb 12 '23 at 23:58
  • What does the HTML index mean? That is not a term I've heard anybody use before. – Emiel Zuurbier Feb 13 '23 at 09:03
  • -`html index` stands for: the index of a character inside the html code -- thats what I mean by `(counting the html code string)`; - I gave an example above, with the expected index, you may understand what I mean from that. - I dont know if such term every exist, so I named it in that way. – Nor.Z Feb 13 '23 at 10:01
  • Okay, so just the position of `#AA` within the string? Would using [indexOf](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/indexOf) on your string help you get that result? – Emiel Zuurbier Feb 13 '23 at 14:56
  • @EmielZuubrier - Because a Regex find fails to work if there are multiple same strings (as I mentioned in the post). - (I can kinda fix that (if they just repeat as Node.ELEMENT_NODE), but that is complicated when "html Node.COMMENT_NODE" are involved). – Nor.Z Feb 13 '23 at 23:09
  • `indexOf` doesn't require a Regex. It looks for the first occurence and returns the index of that. What would be the preferred outcome with multiple or no occurences anyway? – Emiel Zuurbier Feb 13 '23 at 23:19
  • I dont understand why that not matters. If just using Regex find (or non regex, doesnt matter), say 2 elements `AA`,`BB` with same `outerHtml` in 2 different position, you want to find the index of `BB`, but it returns index of `AA` . Its just wrong. – Nor.Z Feb 13 '23 at 23:49
0

So you could use just a more suitable selector, e.g.

const span = document.querySelector('#main_parag>strong>span[id="AA"]');

The above is just theoretical - I do have no environment to test it. But I created it using https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors

Queeg
  • 7,748
  • 1
  • 16
  • 42
  • The structure provided above is just an example, it can be different in other cases. As I said above `the element id may not be available in some cases` – Nor.Z Feb 13 '23 at 00:00
  • Then build a selector avoiding the id attribute, such as `#main_parag>strong>span`. This assumes then that the span element is the only one in that hierarchy. – Queeg Feb 13 '23 at 12:04
  • The point is not about how to pick the element with a selector, the point is about the index of that element... – Nor.Z Feb 13 '23 at 23:12
0

solution

  • @logic:: add a node as an unique indicator before the node you want to search on, then just use regex to find (remember to remove that node once done)

  • @code::

        class RegexUtil {
          // https://stackoverflow.com/questions/3561493/is-there-a-regexp-escape-function-in-javascript
          // https://stackoverflow.com/questions/3446170/escape-string-for-use-in-javascript-regex
          /**
           * @param {String} literal_string
           * @returns {String}
           */
          static escapeRegex(literal_string) {
            return literal_string.replace(/[-[\]{}()*+!<=:?.\/\\^$|#\s,]/g, '\\$&');
          }
    
          /**
           * @param {String} string
           * @returns {String}
           */
          static escapeRegexReplacement(string) {
            return string.replace(/\$/g, '$$$$');
          }
        }
    
        /**
         * @param {Node} node_inner 
         * @param {Element} elt_outer 
         * @returns {Number} return -1 means no match -- node_inner is not inside elt_outer
         */
        function get_IndHtml_of_nodeInner_in_eltOuter(node_inner, elt_outer) {
          if (node_inner === undefined || elt_outer === undefined) {
            throw new ReferenceError();
          }
          if (!(elt_outer.nodeType === Node.ELEMENT_NODE)) {
            throw new TypeError();
          }
    
          const content_SearchOn = elt_outer.innerHTML;
    
          // decide the delimRegexBf
          let html_delimRegexBf;
          let time_now;
          let delimRegexBf_tagName = 'delim-regexbf';
          /** @type {IterableIterator<RegExpMatchArray>} */ let itr;
          let i = 0;
          do {
            i++;
            if (i === 50) {
              throw new Error('Many loops tried, Unable to insert regex_brute_force_delimiter as hardcode_string_indicator in searching content. (The chance of this happening is nearly impossible.)');
            }
            time_now = Date.now();
            html_delimRegexBf = '<' + delimRegexBf_tagName + '>' + time_now + '</' + delimRegexBf_tagName + '>';
            itr = content_SearchOn.matchAll(new RegExp(RegexUtil.escapeRegex(html_delimRegexBf), 'g'));
          } while (itr.next().done !== true);
    
          //
          const elt_delimRegexBf = document.createElement(delimRegexBf_tagName);
          elt_delimRegexBf.innerHTML = time_now;
    
          //
          node_inner.parentNode.insertBefore(elt_delimRegexBf, node_inner);
          const content_SearchOn_withDelim = elt_outer.innerHTML;
          //
          elt_delimRegexBf.remove();
    
          //
          itr = content_SearchOn_withDelim.matchAll(new RegExp(RegexUtil.escapeRegex(html_delimRegexBf), 'gd'));
          const itrEntry_first = itr.next();
          if (itrEntry_first.done === true) {
            // no match -- node_inner is not inside elt_outer
            return -1;
          } else {
            const matcher_first = itrEntry_first.value;
            return matcher_first.indices[0][0];
          }
    
        }
    
    
        // @test-ex
        let elt_outer = $('<div id="main_parag">Sample<!-- comment <span>Get_My_Index</span> --> for testing <span>Get_My_Index</span>; <script>let i = 47;</script>Works?<span>A question<span>Get_My_Index</span></span></div>')[0];
        const treeWalker = document.createTreeWalker(elt_outer, NodeFilter.SHOW_ALL);
        /** @type {Node} */ let node_curr;
        while ((node_curr = treeWalker.nextNode())) {
          const nt = node_curr.nodeType;
          if (nt === Node.TEXT_NODE) {
            console.log(node_curr.textContent);
          } else if (nt === Node.ELEMENT_NODE) {
            console.log(node_curr.outerHTML);
          } else {
            console.log(node_curr.nodeValue);
          }
          console.log(get_IndHtml_of_nodeInner_in_eltOuter(node_curr, elt_outer));
        }
        console.log(elt_outer.innerHTML);
    
        // @test-ex
        elt_outer = $('<div id="main_parag">nothing to search</div>')[0];
        node_curr = $('<span id="non_exist">?</span>')[0];
        console.log(get_IndHtml_of_nodeInner_in_eltOuter(node_curr, elt_outer));
    
    • output

      // Sample
      // 0
      //  comment <span>Get_My_Index</span> 
      // 6
      //  for testing 
      // 48
      // <span>Get_My_Index</span>
      // 61
      // Get_My_Index
      // 67
      // ; 
      // 86
      // <script>let i = 47;</script>
      // 88
      // let i = 47;
      // 96
      // Works?
      // 116
      // <span>A question<span>Get_My_Index</span></span>
      // 122
      // A question
      // 128
      // <span>Get_My_Index</span>
      // 138
      // Get_My_Index
      // 144
      // Sample<!-- comment <span>Get_My_Index</span> --> for testing <span>Get_My_Index</span>; <script>let i = 47;</script>Works?<span>A question<span>Get_My_Index</span></span>
      // -1
      

comment (not important)

  • turns out.. regex is the simplest way to do it, brute forcing an unique delimiter indicator in the search content is pretty helpful...

  • I was trying to find the html string representation for all the node types, & while use Node.prevSibling / Node.parentNode to go through all...

  • (jsfiddle / code snippet somehow doesnt work with my code, but my code in Vscode is functioning fine, dont know if there is some issue with Node.js & browser...)

Nor.Z
  • 555
  • 1
  • 5
  • 13