-2

I am trying to find a way to get any parent (or child) element in HTML structure that is loaded into string by index. Keep in mind that string index is all I have. No tag name, class or id.

Supposes I have following stored in html variable, but the html would be usually much more complicated :

<!DOCTYPE html>
<html>
<head>
<title>Page Title</title>
</head>
<body>

<h1>My First Heading</h1>
<p>My first paragraph.</p>

</body>
</html>

Now I would like to be able to get something like following:

body > h1

Using something like html[90]

The problem here is that user can add any index number they want. So for example index number inside tag itself.

Right now I am using CodeMirror 6 for retrieving the string index by clicking into form if that helps.

  • Your question is unclear. What does the number in your variable represent? How do you expect to get `body > h1` (a CSS selector) from this variable, which appears to reference HTML instead of CSS? What does "index number inside tag itself" mean? – TylerH Jun 23 '22 at 13:25
  • What would be the user use case for this? Why would the user select elements by indexing every character? – Braiam Jun 23 '22 at 13:27
  • Actully the app should work like this - HTML is shown in CodeMirror form. The user then selects part of HTML and CSS selector should be returned based on the selected part of text. – Haggerman Swaggerman Jun 26 '22 at 08:23

1 Answers1

0

This isn't all that elegant, but one possibility is to construct an ordered array of all text node values in the original document. Then, using the original string, determine a position that isn't inside an attribute (that is, is inside a text node), and insert a character there. (That's the tough part.) After that, you can create another array of text nodes from the modified string, identify the text node that's different, and figure out its ancestors.

Determining the position where the character can be inserted can be done with a simple .replace to find the next >, and add the character after that.

const str = `<!DOCTYPE html>
<html>
<head>
<title>Page Title</title>
</head>
<body>

<h1>My First Heading</h1>
<p>My first paragraph.</p>

</body>
</html>`;

const getSelector = (str, position) => {
  const startsOutsideTag = /^[^<>]*</.test(str.slice(position));
  const changedStr =
    str.slice(0, position) +
    (startsOutsideTag
      ? ' ' + str.slice(position)
      : str.slice(position).replace('>', '> '));
  const [originalDoc, originalNodes] = getDocAndTextNodes(str);
  const [changedDoc, changedNodes] = getDocAndTextNodes(changedStr);
  for (let i = 0; i < originalNodes.length; i++) {
    if (originalNodes[i].nodeValue !== changedNodes[i].nodeValue) {
      return getAncestorNames(originalNodes[i]);
    }
  }
}

const getDocAndTextNodes = (str) => {
  const doc = new DOMParser().parseFromString(str, 'text/html');
  // https://stackoverflow.com/questions/2579666/getelementsbytagname-equivalent-for-textnodes
  const walker = document.createTreeWalker(
    doc,
    NodeFilter.SHOW_TEXT, 
    null, 
    false
  );

  let node;
  const textNodes = [];

  while(node = walker.nextNode()) {
    textNodes.push(node);
  }
  return [doc, textNodes];
};
const getAncestorNames = (node) => {
  let ancestorNames = [];
  while (node = node.parentElement) {
    ancestorNames.push(node.tagName);
  }
  return ancestorNames.reverse().join(' > ').toLowerCase();
};

console.log(getSelector(str, 90));

Your current code doesn't contain closing angle brackets inside attribute values - like <div class=">foo">, and the code above takes that as an assumption. Having closing angle brackets inside attribute values is pretty unusual but would complicate things a bit. Given both constructed documents, iterate through all elements, and iterate through each of their attributes values, and .replace angle brackets with some other character.

const removeBracketsFromAttributeValues = (doc) => {
  for (const elm of doc.querySelectorAll('*')) {
    for (const attribute of elm.attributes) {
      attribute.value = attribute.value.replace(/<|>/g, ' ');
    }
  }
};
CertainPerformance
  • 356,069
  • 52
  • 309
  • 320