2

Consider the following HTML:

<div class="status-date">
  <strong>Date Available:</strong> 
  10/05/2016
</div>

I'd expect the :not() selector to be capable of targeting the date string "10/05/2016" as follows:

.status-date *:not(strong) {
  text-decoration: underline;
}

Two questions:
1. Is the :not() selector capable of this?
2. If not, is any CSS selector capable of this?

Context: this is actually not about styling the text nodes. I am doing some web scraping and I'd like to ignore the <strong> tag in this case. If it were about styling, I could target the div directly and overwrite the styles on the <strong> to "cancel it out".

Further context: I can see that my naïve attempt doesn't work as expected. For example, as shown in this codepen: http://codepen.io/anon/pen/rWezQK But it's possible I'm misunderstanding something deep about the selector or the DOM structure I've described.

BoltClock
  • 700,868
  • 160
  • 1,392
  • 1,356
Joseph Fraley
  • 1,360
  • 1
  • 10
  • 26
  • The answer to your first question is: did you try it? – Andrew Li Nov 13 '16 at 02:31
  • Of course, and it does not seem to work. But I thought it possible I was misunderstanding the spec. The specification itself offers only three lines on the `:not()` selector. – Joseph Fraley Nov 13 '16 at 02:32
  • 2
    Well seems as if you are. Doing `.status-date *` only selects the HTML elements, not text nodes. – Andrew Li Nov 13 '16 at 02:36
  • for your example, here is the way to do it http://codepen.io/anon/pen/MbyEYX . CSS do not access text nodes anyway. Your selector :not() ,if you drop the text node idea :) , works fine http://codepen.io/anon/pen/WowZvZ – G-Cyrillus Nov 13 '16 at 03:09
  • *"I am doing some web scraping and I'd like to ignore the tag in this case."* - you probably should elaborate further on what you're actually trying to achieve. Anyway, this smells like a job for XPath instead of CSS selectors. – the8472 Nov 13 '16 at 09:03
  • @the8472 I want the well-structured date without the "Date available:" tag. I can strip it using JavaScript before I store the date itself, but it would be convenient if a simple selector could grab it out-of-the-box. – Joseph Fraley Nov 13 '16 at 09:31
  • @GCyrillus thank you, but I specifically said that I was not interested in that solution. I am not actually trying to style anything, only using a scrape tool that uses CSS selectors, so overwriting the `` styles is not helpful in this case. – Joseph Fraley Nov 13 '16 at 09:33

2 Answers2

2

Simple selectors represent elements. This is true for all simple selectors, including * and :not(). Text is contained by an element, but is not an element in its own right. You won't be able to "match" just the text with any CSS selector, because as far as selectors are concerned, what the DOM calls text nodes don't even exist in the document tree.

The specification itself offers only three lines on the :not() selector.

The first line in the specification supports this:

The negation pseudo-class, :not(X), is a functional notation taking a simple selector (excluding the negation pseudo-class itself) as an argument. It represents an element that is not represented by its argument.

Notice that it says "It represents an element".

If you're doing web scraping, consider XPath:

//div[contains(concat(' ', @class, ' '), ' status-date ')]/strong/following-sibling::text()
BoltClock
  • 700,868
  • 160
  • 1,392
  • 1,356
  • Thanks, this is a comprehensive and direct answer to my explicit question, with a concrete suggestion for solving my specific problem. – Joseph Fraley Nov 13 '16 at 09:35
1

By default, CSS cannot affect text-nodes that are not wrapped in a container. Hence, the :not-selector is not capable of doing what you're trying to do. If you're gonna scrape information like this, you will have to parse it server-side - or just set div strong { display: none; } for viewing the content - but that will probably not affect the scraping bit...

junkfoodjunkie
  • 3,168
  • 1
  • 19
  • 33