4

Possible Duplicates:
Parsing CSS in JavaScript / jQuery
Parsing CSS string with RegEx in JavaScript

How can I find out if a string contains CSS rules?

Example rules:

selector {
  property:value;
}


selector { property:value; }

selector{property:value}

...

Basically I want to find out if a text block represents either PHP + HTML or CSS code.

One way to do this - I was thinking to trim the text, then match the first character of the text with #, . or a CSS selector such as body, p etc. DO you think it's a good idea?

Community
  • 1
  • 1
Alex
  • 66,732
  • 177
  • 439
  • 641
  • 1
    [This](http://www.catswhocode.com/blog/10-regular-expressions-for-efficient-web-development) article has a regex for pulling out CSS attributes. That may be modifiable to suit your purpose. The regex is: `\s(?[a-zA-Z-]+)\s[:]{1}\s*(?[a-zA-Z0-9\s.#]+)[;]{1}` – Chad Jan 20 '12 at 21:02
  • Not sure how concise this is, never having used it myself, but it looks promising... http://www.catswhocode.com/blog/10-regular-expressions-for-efficient-web-development – Reinstate Monica Cellio Jan 20 '12 at 21:02
  • It's not a good idea to trim the text and try and match the selectors. You'd need to match the list of every HTML element out there. – Alex Turpin Jan 20 '12 at 21:05

2 Answers2

4

tldr; Consider using a proper CSS parser, such as JSCSSP, for final validation.

It depends on the goal and a regular expression might be entirely invalid.

If this is just "attempting" to see if it "could" contain CSS selectors, then I might be inclined to try an overly-broad match, which will fail is there is anything complicated the CSS string values (like "}") or there are CSS comments, and will accept a wide range of input that is not valid CSS:

(?:\s*\S+\s*{[^}]*})+       // use anchored

Likewise, an expression that should detect most simple HTML (however invalid) with tags, and only unlucky cases of CSS (matches in comments or CSS strings or crazy child selectors):

<(?:br|p)[^>{]*>|</\w+\s*>   // use case-insensitive

Happy coding.

Also see: Parsing CSS in JavaScript / jQuery

Community
  • 1
  • 1
2

http://arxiv.org/abs/1106.4064 might interest you.

Algorithmic Programming Language Identification

David Klein, Kyle Murray, Simon Weber

(Submitted on 21 Jun 2011 (v1), last revised 9 Nov 2011 (this version, v2))

Motivated by the amount of code that goes unidentified on the web, we introduce a practical method for algorithmically identifying the programming language of source code. Our work is based on supervised learning and intelligent statistical features. We also explored, but abandoned, a grammatical approach. In testing, our implementation greatly outperforms that of an existing tool that relies on a Bayesian classifier. Code is written in Python and available under an MIT license.

Community
  • 1
  • 1
Mike Samuel
  • 118,113
  • 30
  • 216
  • 245