1

I am trying to parse the pseudo selector content in javascript. Html content can be

content: counter(item)" " attr(data) "" counter(item1,decimal) url('test.jpeg') "hi" attr(xyz);

To parse this content i am using below regex (logic of matching parenthesis copied from internet )

 counter\((?:[^)(]+|\((?:[^)(]+|\([^)(]*\))*\))*\)

This selects all the counter with "(" but counter can not have nested parentheses (as far as i know, correct me if i am wrong).Similarly same regex i am using to select other content also.

  1. Attr : attr\((?:[^)(]+|\((?:[^)(]+|\([^)(]*\))*\))*\)

  2. Quotes: openQuote\((?:[^)(]+|\((?:[^)(]+|\([^)(]*\))*\))*\)

  3. String: anything inside double/single quotes: (current regex is not working ".*")

I have below questions here 1. Regex to match single parenthesis (no nested parenthesis is possible in pseudo selector content property) 2.Single regex that will match the counter, attribute , url and string content in the given order (order is important because i want to replace them later with evaluated values)

Please let me know if any more information is required from side. Thanks

Pavan Tiwari
  • 3,077
  • 3
  • 31
  • 71

1 Answers1

1

Your first regex does indeed match nested parentheses (but not escaped parentheses). Is that desirable?

Without nesting or escaping, these become much simpler.
Here's a variant of your first regex that ignores nesting possibilities:

counter\([^)]*\)

It matches a literal counter( and then zero or more non-close-parentheses, then finally a close parenthesis. (Full explanations of your first regex and my simpler version at regex101.)

I believe that answers your first question, though if you're literally looking for a "regex to match [a] single parenthesis," that's just [()], which will match either an open or a close parenthesis character. You could alternatively explicitly match \( or \) if you know which one you want to match.

Matching quotes (without regard to nesting or escaped quotes) is similarly easy:

"[^"]*"

This matches a literal double quote character ("), then zero or more non-doublequote characters, then another literal double quote character.

Your second request was for a "single regex that will match the counter, attribute , url and string content in the given order (order is important because i want to replace them later with evaluated values)."

I'm not sure how you intend to get the CSS content property's value, given how that's typically in an ::after or ::before pseudo-class, which are not available from the DOM, but here's some dummy code populating it so we can manipulate it:

var css = `content: counter(item)" " attr(data) "" counter(item1,decimal) url('test.jpeg') "hi" attr(xyz); color:red;`;

// harvest last `content` property (this is tricked by `content: "content: blah"`)
var content = css.match(/.*\bcontent:\s*([^;"']*(?:"[^"]*"[^;"']*|'[^']*'[^;"']*)*)/);
if (content) {
  var part_re = /(?:"([^"]*)"|'([^']*)'|(?:counter|attr|url)\(([^)]*)\))/g;
  while ( part = part_re.exec(content[1]) ) { // parse on just the value
    if      (part[0].match(/^"/))       { /* do stuff to part[1] */ }
    else if (part[0].match(/^'/))       { /* do stuff to part[2] */ }
    else if (part[0].match(/^counter/)) { /* do stuff to part[3] */ }
    else if (part[0].match(/^attr/))    { /* do stuff to part[3] */ }
    else if (part[0].match(/^url/))     { /* do stuff to part[3] */ }

    // silently skips other values, like `open-quote` or `counters(name, string)`
  }
}

The first regex (line 4) extracts the last content property from the CSS (last because it'll override previous instances, though note the fact that this'll stupidly extract content: blah from content: "content: blah"). After finding the last instance of a word break and then content:, it absorbs any whitespace and then matches the rest of the line until a semicolon, double quote, or single quote. A non-capture group allows for any content between double quotes or a single quote, much in the same way we matched quotes near the top of this answer. (Full explanation of this CSS content regex at regex101.)

The second regex (line 7, assigned to part_re) is in a while loop so we can work on each individual value in the content property in order. It matches double-quoted strings or single-quoted strings or certain named values (counter or attr or url). See the conditionals and comments for where the values' data are stored. Full explanation of this value parsing regex at regex101 (see "Match Information" in the middle of the right column to see how I'm storing the values' data).

Adam Katz
  • 14,455
  • 5
  • 68
  • 83
  • Thanks for your response, Counter can not have nested parenthesis (i.e. counter(item (somevalue)) is not possible), It can only have one open and closing parenthesis. Css content property can have any combination of counter, attribute, string, url and quotes in any order. I need regex to parse the same in give result in the given order later to replace with actual value. Hope this helps you. If you need more information, Please let me know – Pavan Tiwari Jul 05 '18 at 15:47
  • @PavanTiwari – Are you assuming a valid [CSS content](https://developer.mozilla.org/en-US/docs/Web/CSS/content) declaration? I'm not going to help you build a fully-featured parser by the spec, but I can help you with the four items you've requested. For more than that, you should use a real CSS parser like [`HTMLelement.style`](https://developer.mozilla.org/en-US/docs/Web/API/HTMLElement/style) or [`getComputedStyle()`](https://developer.mozilla.org/en-US/docs/Web/API/Window/getComputedStyle). – Adam Katz Jul 05 '18 at 17:05
  • Thanks again. Javascript doesn't allows us to directly access the value of pseudo elements. So I am building a framework which will first parse the content and then replace them with evaluated value. GetComputedStyle will only give use declared value of css content not the evaluated one. – Pavan Tiwari Jul 08 '18 at 04:18
  • Are you sure? That's literally why it's called get _Computed_ Style; it "reports the values of all CSS properties of an element after applying active stylesheets and resolving any basic computation those values may contain." HTMLelement.style is the one that will only give you the declared value of CSS content. (The issue here is separate: the DOM doesn't give access to pseudo-elements, so you can't get their computed styles either.) – Adam Katz Jul 09 '18 at 15:23
  • is it possible to create an array of value by using /.*\bcontent:\s*([^;"']*(?:"[^"]*"[^;"']*|'[^']*'[^;"']*)*)/ like [counter(item, decimal), ' ', attr(data)] . for input string content:counter(item, decimal)' ' attr(data) – Pavan Tiwari Jul 09 '18 at 17:12
  • `arr = content[1].match(/"[^"]*"|'[^']*'|(?:counter|attr|url)\([^)]*\)/g)` will do that for you (I merely stripped out all unescaped parentheses except those grouping `counter|attr|url`). I figured you actually wanted to specifically parse the contents, which my answer matches explicitly; `part[1]` and `part[2]` are the strings without their surrounding quotes and `part[3]` is the list of arguments inside the function call, without the function call or parentheses. – Adam Katz Jul 09 '18 at 20:56
  • Thanks a lot Adam – Pavan Tiwari Jul 11 '18 at 08:45
  • One more case is there when string content is present but there is no quotes. – Pavan Tiwari Jul 27 '18 at 09:24
  • One more case is there when string content is present but there is no quotes. for example "content:counter(item, decimal), ' ', counter(item, decimal), test". How can i convert this into an array. – Pavan Tiwari Jul 27 '18 at 09:30