can't get exclusion of elements in document.querySelectorAll to work

Question

hoping to avoid repeatedly typing references to anchors by hand, I tried to come up with a way to make any occurrence of given terms into automagically linked references to a same-named anchor, like turn 'foo' into <a href="#foo">foo</a>, 'bar' into <a href="#bar">bar</a>, and so on.

I can't, however, seem to get my clunky approach to skip already linked occurrences (or other elements like style, script, etc). Positive selection (like nodeList = document.querySelectorAll(".entry-content a"); ) works just fine, but exclusion (like e.g. nodeList = document.querySelectorAll(".entry-content:not(a, style, script)"); eludes me. I sifted thru quite a lot of questions that looked similar, already, but could adapt none for my stubborn problem :/ so, I must definitely be doing something wrong.

Your help figuring this out is much appreciated. Here's where I'm at right now:

function rep() {
        
        const nodeList = document.querySelectorAll(".entry-content:not(a, style, script)");

            for (let i = 0; i < nodeList.length; i++) {

                nodeList[i].innerHTML = nodeList[i].innerHTML.replace(/foo/g, '<a href="#foo" style="background: lime;">REPLACED FOO</a>');
                }
        
        }

But this just blatantly replaces every occurrence of 'foo' inside my class="entry-content", regardless of element type it appears in (instead of disregarding a and style and script elements).

Thank you for your look at this. Cheers -

hi @Spectric - does this work for you? : https://jsfiddle.net/h9b4evq8/ — eLeXeM, Dec 14 '22 at 02:39
first thing first you don't need to specify the selector if you are using document.querySelectorAll() just type your class name that;s it. — Shilpe Saxena, Dec 14 '22 at 03:36
I'm not going after a class, tho, but after certain types of elements ;) — eLeXeM, Dec 14 '22 at 09:10

CertainPerformance · Answer 1 · 2022-12-14T13:58:35.880

:not only accepts "simple selectors" - selectors with only one component to them. a, style, script is not a simple selector, so :not(a, style, script) doesn't produce the desired results. You could put the logic to exclude those tags in the JavaScript instead of the selector.

But that's not enough. Some of the elements you don't want to match are descendants of the .entry-content elements. For example, a <p class="entry-content"> will not be excluded from being a match just because it has an <a> descendant. So just matching .entry-content elements and replacing won't be enough. You'll need some different logic to identify text which has a .entry-content ancestor and also doesn't have a blacklisted tag as an ancestor.

One possibility would be to iterate over text nodes and check if their parent element has a .closest element that matches, and doesn't have a .closest element in the blacklist. The replacement of text nodes with a varying number of possibly non-text nodes is pretty cumbersome, unfortunately. Better to avoid assigning to .innerHTML - that'll corrupt references that JavaScript scripts may have to elements inside the container that gets altered.

// https://stackoverflow.com/q/2579666
function nativeTreeWalker() {
    const walker = document.createTreeWalker(
        document.body, 
        NodeFilter.SHOW_TEXT, 
        null, 
        false
    );

    let node;
    const textNodes = [];
    while(node = walker.nextNode()) {
        textNodes.push(node);
    }
    return textNodes;
}

function rep() {
    for (const node of [...nativeTreeWalker()]) {
        const parent = node.parentElement;
        if (!node.textContent.includes('foo') || !parent.closest('.entry-content') || parent.closest('a, style, script')) {
            continue;
        }
        // Degenerate case
        if (parent.matches('textarea')) {
            // In modern environments, use `.replaceAll` instead of regex
            parent.textContent = parent.textContent.replaceAll('foo', '<a href="#foo" style="background: lime;">REPLACED FOO</a>');
            continue;
        }
        // At this point, we know this node needs to be replaced.
        // Can't just use parent.innerHTML = parent.innerHTML.replace
        // because other children of the parent (siblings of this node) may be on the blacklist

        // Use a DocumentFragment
        // so the new nodes can be inserted at the right position
        // all at once at the end
        const newNodesFragment = new DocumentFragment();
        for (const match of node.textContent.match(/foo|((?!foo).)+/g)) {
            if (match !== 'foo') {
                newNodesFragment.append(match);
            } else {
                newNodesFragment.append(
                    Object.assign(
                        document.createElement('a'),
                        {
                            href: '#foo',
                            style: 'background: lime',
                            textContent: 'REPLACED FOO'
                        }
                    )
                );
            }
        }
        // Insert the new nodes
        parent.insertBefore(newNodesFragment, node);
        // Remove the original text node
        parent.removeChild(node);
    }
}

.red {
  background: red;
}

<h1>A Foo is not a foo and a Bar is not a bar</h1>
<p style="color:gray;"> this p has <b>no class</b>, nothing should happen here. Hi, I'm a foo. I'm a bar. Hi, I'm <a href="some link.htm">an already <b>linked foo</b></a>. I'm a bar. Hi, I'm a foo. I'm a bar. Hi, I'm a foo. I'm a bar. Hi, I'm a foo. I'm a bar. Hi, I'm <a href="some link.htm"
    target="_blank">an already <b>linked bar</b></a>. Hi, I'm a foo. I'm a bar.</p>

<p class="entry-content">This p <b>has class .entry-content</b>. Hi, I'm a foo. I'm a bar. Hi, I'm a foo. I'm a bar. Hi, I'm a foo. I'm a bar. Hi, I'm a foo. I'm a bar. Hi, I'm a foo. I'm a bar. Hi, I'm <a href="some link.htm" target="_blank">.ean already <b>linked foo - should not be altered</b></a>.
  I'm a bar. Hi, I'm a foo. I'm a bar. Hi, I'm a foo. I'm a bar. Hi, I'm a foo. I'm a bar. <span class="red">Hi, I'm a foo</span>. I'm a bar. Hi, I'm a <span><b>foo in a span</b></span>. I'm a bar. Hi, I'm <a href="some link.htm">an already <b>linked bar</b></a>.
  Hi, I'm a foo. I'm a bar. <textarea>Hi, I'm a foo. I'm a bar. Hi, I'm a foo. I'm a bar. Hi, I'm a foo. I'm a bar. Hi, I'm a foo. I'm a bar.</textarea> </p>

<button onclick="rep()">do rep()</button>

hi, @CertainPerformance; that sure looked promising, but sadly yields the same outcome. Thank you nonetheless — eLeXeM, Dec 14 '22 at 02:45
to add + I don't know if this helps: but the simple selector didn't work with only :not(a) in it, either, when I tried that. :/ (like so `nodeList = document.querySelectorAll(".entry-content:not(a)");` ) — eLeXeM, Dec 14 '22 at 02:56
The fact that the elements you want to exclude can be children of the ones you're matching makes this a much harder problem - you're now dealing with the replacement of text nodes, see edit — CertainPerformance, Dec 14 '22 at 03:36
YES! AWESOME! :D That did it!! I could have _never_ have come up with that by myself. Thank you so much! (Y) — eLeXeM, Dec 14 '22 at 09:16
I've been playing with this over the day and it adapts just great into my use-case; with just one question-mark remaining: I'm scattering the treewalkers into my post/s dynamically, where the term is explained (using the Post Snippets plugin under WordPress) + extending the function name with the id of the term auto-reffed to keep more than 1 walkers from falling over each other. I'm seeing the effect that up to the point where it is put into the post all _earlier_ occurrences of e.g. 'foo' are wonderfully made into auto-refs, but _after_ the injection 'foo' is no longer transformed. [ctd] — eLeXeM, Dec 14 '22 at 15:34
if I put 2 into the post, I get the following effect/s, in succession: --- * ('foo' and 'bar' before the transformer for foo : both get transformed) --- * injection of the transformer + definition block for 'foo' --- * ('foo' gets no longer transformed, 'bar' still gets transformed) --- * injection of the transformer + definition block for 'bar' --- * ('foo' and 'bar' no longer get transformed.) [ctd] — eLeXeM, Dec 14 '22 at 15:41
I'm guessing this has something to do with your fine suggestion assuming the insertions would all be made at the end? (going by the in-script-commentary `// Use a DocumentFragment // so the new nodes can be inserted at the right position // all at once at the end`) — eLeXeM, Dec 14 '22 at 15:41
here's a screenshot, probably easier to grasp what's happening :) https://pasteboard.co/xi9rJoCLueJE.png — eLeXeM, Dec 14 '22 at 16:11
I was going after an issue where it first looked like the match value was failing due to spaces in some cases - but as it turns out, there is an extension at work in our environment that foils multi-term matching by inserting soft-hyphens (``) per script to improve appearance of justified text across the site. Is there a way for the matcher to ignore soft-hyphens / match string instances regardless of soft-hyphens contained? — eLeXeM, Dec 15 '22 at 15:10
I don't understand what you mean by "the TreeWalkers fall over each other". Run only one walker at a time. If you need to linkify multiple substrings at once, put the logic for that inside the single walker. For example, change to `!(node.textContent.includes('foo') || node.textContent.includes('bar'))` and to `.match(/(?:foo|bar)|((?!foo|bar).)+/g)` and so on, to linkify both `foo` and `bar`. — CertainPerformance, Dec 15 '22 at 15:25
I added some ``s to the example HTML, but everything still looked to get linkified as expected. — CertainPerformance, Dec 15 '22 at 15:26
re [Question a](#comment132010709_74792985) I must have expressed myself poorly; apologies. I do have a working way to run more than 1 walker in the same post. I was just hoping there was a way for the linkification to continue linkification _below_ its point of insertion as well. Thank you for the added explanation to match more than 1 term per instance. — eLeXeM, Dec 15 '22 at 15:46
re [``s] : interesting. If I add manual shys here, matching stops working; see replication of test > https://pasteboard.co/jIgtLcNJh0qW.png — eLeXeM, Dec 15 '22 at 15:52
I "solved" the issue with those automatically inserted ``s ;) by simply replacing the pesky plugin with a simple css `hyphens: auto;` + matching of multi-word strings works ;) (y) Better on resources, too. — eLeXeM, Dec 15 '22 at 17:02

score 0 · Answer 2 · answered Dec 14 '22 at 02:43

0

You want to create a class to make new anchor elements.

class anchor {
    constructor(text) {
    let a = document.createElement("a");
    let t = document.createTextNode(text);
    a.href = text;
    a.appendChild(t);
    return a;
  }
}

document.body.appendChild(new anchor("foo"));
document.body.appendChild(new anchor("bar"));

a {
  margin:24px;
}

answered Dec 14 '22 at 02:43

Ronnie Royston

16,778
6
77
91

Hi, @ronnie-royston; if I have to attribute a span / class to all the elements I want to change, that kinda defeats the "make things easy" idea. Think Wordpress, where you have to manually dive into any p to add a span with a class to a term you want affected. So, yes; this would of course work technically, but is not answering the question for a way to exclude elements, just offering a different approach at INclusion; sorry; but thanks. :) – eLeXeM Dec 14 '22 at 02:46
in that case just do `querySelectorAll(".entry-content")` then `.forEach` over the collection and do an if `.tagName` based on what tags you want or don't want. – Ronnie Royston Dec 14 '22 at 02:55
can you put this into a workable example? ".tagName" still looks like a class grabber to me (due to the dot). I may be missing something, so a workable example might help. – eLeXeM Dec 14 '22 at 09:13

can't get exclusion of elements in document.querySelectorAll to work

2 Answers2