This isn't all that elegant, but one possibility is to construct an ordered array of all text node values in the original document. Then, using the original string, determine a position that isn't inside an attribute (that is, is inside a text node), and insert a character there. (That's the tough part.) After that, you can create another array of text nodes from the modified string, identify the text node that's different, and figure out its ancestors.
Determining the position where the character can be inserted can be done with a simple .replace
to find the next >
, and add the character after that.
const str = `<!DOCTYPE html>
<html>
<head>
<title>Page Title</title>
</head>
<body>
<h1>My First Heading</h1>
<p>My first paragraph.</p>
</body>
</html>`;
const getSelector = (str, position) => {
const startsOutsideTag = /^[^<>]*</.test(str.slice(position));
const changedStr =
str.slice(0, position) +
(startsOutsideTag
? ' ' + str.slice(position)
: str.slice(position).replace('>', '> '));
const [originalDoc, originalNodes] = getDocAndTextNodes(str);
const [changedDoc, changedNodes] = getDocAndTextNodes(changedStr);
for (let i = 0; i < originalNodes.length; i++) {
if (originalNodes[i].nodeValue !== changedNodes[i].nodeValue) {
return getAncestorNames(originalNodes[i]);
}
}
}
const getDocAndTextNodes = (str) => {
const doc = new DOMParser().parseFromString(str, 'text/html');
// https://stackoverflow.com/questions/2579666/getelementsbytagname-equivalent-for-textnodes
const walker = document.createTreeWalker(
doc,
NodeFilter.SHOW_TEXT,
null,
false
);
let node;
const textNodes = [];
while(node = walker.nextNode()) {
textNodes.push(node);
}
return [doc, textNodes];
};
const getAncestorNames = (node) => {
let ancestorNames = [];
while (node = node.parentElement) {
ancestorNames.push(node.tagName);
}
return ancestorNames.reverse().join(' > ').toLowerCase();
};
console.log(getSelector(str, 90));
Your current code doesn't contain closing angle brackets inside attribute values - like <div class=">foo">
, and the code above takes that as an assumption. Having closing angle brackets inside attribute values is pretty unusual but would complicate things a bit. Given both constructed documents, iterate through all elements, and iterate through each of their attributes values, and .replace
angle brackets with some other character.
const removeBracketsFromAttributeValues = (doc) => {
for (const elm of doc.querySelectorAll('*')) {
for (const attribute of elm.attributes) {
attribute.value = attribute.value.replace(/<|>/g, ' ');
}
}
};