Is there an alternative to jQuery / sizzle that supports textNodes as first class citizens in selectors?

Question

I've discovered that I have a need for selectors with full support for DOM textNodes that jQuery doesn't provide.

jQuery ignores text nodes, probably because most pages have tons of irrelevant blank ones between tags that the various browsers can treat differently.

Most answers to jQuery questions about text nodes come down to using the .contents() function which returns all the child nodes for the selected items, including text nodes - all other jQuery APIs ignore text nodes.

Often you don't need something that can't easily be built upon .contents() but I have found myself in such a situation.

My use case is that I want to locate and then wrap arbitrary runs of text in 3rd-party web pages over which I have no control. (Think browser extension or userscript.)

So far I've been happy to walk the DOM looking for all text nodes or find a wrapper element that contains all the text nodes I'm interested in and use .contents() to iterate through them.

But now I have found that I sometimes need the full power of jQuery/sizzle selectors to narrow my focus down to certain possibilities of classes within classes etc.

I considered ways to extend jQuery with a textNode selector but that seems to be impossible due to a pervasive rule of ignoring text nodes which would filter many of them out before my extension gets called.

Thus I'm looking for some other JavaScript tool which offers something like selectors but allows selecting text nodes arbitrarily mixed in its selector expression syntax.

Here's an example of what I might need to do:

$('.ii:even > div > TXT, .ii:even > div > div.im > TXT')

Here's an example I personally haven't needed yet but can easily imagine:

$('#something .somethingElse TXT')

When you can address (select) the immediate parent(s) of the textNodes, iterating over their .contents() is easy, not so when you can only identify some arbitrary ancestor but want all the text nodes below this, which is of course trivial for element nodes.

Why can't you narrow the focus down to the specific elements, then use `.each()` to iterate the results and call `.contents()` on each result? CSS-style selectors simply won't target text nodes directly. — I Hate Lazy, Dec 09 '12 at 23:22
Because it needs to support arbitrary 3rd party web pages whose structures I can't anticipate ahead of time. The best way to tackle each new one is to devise a way to address the relevant parts. Selectors are a mechanism to address arbitrary parts of a DOM. They don't have to be CSS-style. sizzle already allows you to target many things CSS alone "simply won't target". Another implementation can do whatever it wants. XPath, for instance does apparently have a way to address text nodes. For all I know there are other ways besides CSS style and Xpath style? `\-:` — hippietrail, Dec 09 '12 at 23:29
right, but why does it need to be a selector? Your pseudo selector is targeting `... > TXT`, so why not just eliminate the end, and get the `.contents()` of each match? But if you really want custom selectors, you can extend Sizzle to some degree with your own filters. But those non-standard Sizzle selectors should be avoided IMO. — I Hate Lazy, Dec 09 '12 at 23:38
I don't see how finding a selector would make it any easier than using `contents()` and `filter()`. Just filter `nodeType` — charlietfl, Dec 09 '12 at 23:44
@IHateLazy: You could be right. I've been tackling this problem on several fronts for a while now so I'm sure I haven't tried everything yet. I had a previous question I'll try to link to about a custom pseudo selector but my reading suggested the other parts of jQuery/sizzle would filter out all the text nodes before my code would get to them. — hippietrail, Dec 09 '12 at 23:44
Yeah, I don't really remember how the custom filters work, but if it could be done, I'd think you'd have to do something like `'.ii:even > div:TXT, .ii:even > div > div.im:TXT'` and then have the custom filter take the matched elements, and instead put the child text nodes in the result. Not exactly sure though. — I Hate Lazy, Dec 09 '12 at 23:47
Hmm using `.contents()` and `.filter()` would I see the elements in the order of my query or in the order in the DOM? For `` with `$('foo,bar')` would I see `foo, bar, foo` or `foo, foo, bar`? Is either order guaranteed? If it's guaranteed to be DOM order that might be what I seek ... — hippietrail, Dec 09 '12 at 23:50
I'll leave the question here though because there's still the chance of a literal answer of other libraries which address (or select) bits of DOM in different ways. — hippietrail, Dec 09 '12 at 23:51
only way to really help is with some real world html and some sort of filter conditions. Hard to know what you need to accomplish from explanation — charlietfl, Dec 09 '12 at 23:55
@charlietfl: Basically i'm making a browser extension that lets either a developer or ideally a user specify arbitrary text patterns to find in arbitrary web pages that will then have spans inserted to add colour-coded highlighting. `
Name: Fred Blogg
` -> `
Name: Fred Blogg
` but for arbitrary real-world web pages where the HTML and the text can be far less trivial. — hippietrail, Dec 10 '12 at 00:02
sounds like more of a regex and html parsing headache than finding textNodes which is fairly easy — charlietfl, Dec 10 '12 at 00:06
one simple thought is just wrap each word of the input. That way you don't have as much problem with text that spans different nodes — charlietfl, Dec 10 '12 at 00:08
The regex part is the easy part (-: The problem is finding just the matching text in the matching bits of DOM. I don't want every "Name", just ones in certain bits of web pages which are generated by all kinds of bizarre tools out there. So far one fortunate restriction is that it doesn't (yet) need to deal with text that crosses tag boundaries. — hippietrail, Dec 10 '12 at 00:09
For my current specific problem I took another path, but I still think this is an interesting general problem. One case where iterating over `.contents()` won't work is where you don't know the direct parent and want to select by some arbitrary ancestor as you can do fine with tags, `id`s, `class`es, etc: `#foo .bar span` is possible but `#foo .bar TXT` is not possible. Iterating over the parent's contents would work only with ... `.bar >` ... — hippietrail, Dec 12 '12 at 04:24

Matey Yanakiev · Answer 1 · 2012-12-12T04:39:48.207

Here is something you could do:

jQuery.fn.getTextNodes = function(val,_case) {
    var nodes = [],
        noVal = typeof val === "undefined",
        regExp = !noVal && jQuery.type(val) === "regexp",
        nodeType, nodeValue;
    if (!noVal && _case && !regExp) val = val.toLowerCase();
    this.each(function() {
        if ((nodeType = this.nodeType) !== 3 && nodeType !== 8) {
            jQuery.each(this.childNodes, function() {
                if (this.nodeType === 3) {
                    nodeValue = _case ? this.nodeValue.toLowerCase() : this.nodeValue;
                    if (noVal || (regExp ? val.test(nodeValue) : nodeValue === val)) nodes.push(this);
                }
            });
        }
    });
    return this.pushStack(nodes, "getTextNodes", val || "");
};

Then you could use the following:

$("selector").getTextNodes("selector");

Here is a JSFiddle.

How .getTextNodes() works is very simple. If you don't pass an argument, it returns all text nodes. If you pass it a string, it returns text nodes with that exact same nodeValue. If you are passing it a string, set the second argument to a truthy value for a case-insensitive check. The first argument can also be a regular expression against which the nodeValue is matched.

Hope this helps.

Edit: Note that you can also use $("selector").getNodes("selector").end(), since it uses .pushStack().

This looks interesting. I'll come back and vote after I have a chance to play with it - thanks. — hippietrail, Dec 12 '12 at 04:26

Is there an alternative to jQuery / sizzle that supports textNodes as first class citizens in selectors?

1 Answers1

Linked