How to match HTML "selected" option with preg_match

Question

Browsers consider an <option> selected by default if it has the selected="selected" attributes. But this somehow works even if that attribute value is omitted.

So

<option selected="selected" value="1">value text</option>

and this works

<option selected value="1">value text</option>

My question is how to write a Regex pattern that matches both conditions above, but never matches something like

<option value="the devil with **selected** ">value text</option>

EDIT: I forgot to mention that some conditions are still considered valid XHTML, like selected='selected', or selected=selected or even selected=SelEctEd

I know that regular expressions are not perfect, if ever useful, to parse XHTML. But in my case there's no way to use other tools like an XML parser — doc_id, Dec 22 '15 at 13:09
Sry to say this, but I won't do any thinking before I can see your own try on this that does not work. You know exactly whats supposed to go in what should come out, so I see no reason to do your work. ;) — dryman, Dec 22 '15 at 13:15
Empty attributes are [quite well standardized](http://www.w3.org/TR/html-markup/syntax.html#syntax-attr-empty), and even recommended for those attributes. In X(HT)ML this [is not allowed](http://www.w3.org/TR/2000/REC-xhtml1-20000126/#h-4.5) however. — Niels Keurentjes, Dec 22 '15 at 13:23
I know this isn't RegEx - but you are using PHP and it's [so simple using DOMDocument so here is some example code](https://3v4l.org/fllP7). — Dean Taylor, Dec 22 '15 at 14:09

score 0 · Answer 1 · answered Dec 22 '15 at 15:00

0

With PCRE (which PHP uses) this works:

<option.*?\s(?:selected(?:=\"selected\")?)\s.*?>
# look for <option literally
# followed by anything (non greedy) and a whitespace(!)
# open a non capturing group and look for selected, eventually followed by ="selected"
# close the group, followed by a whitespace
# followed by anything (non-greedy) and the closing tag

See a regex 101 demo here. Besides, read the comments, there a good hints (using DomDocument, etc.) in there.

answered Dec 22 '15 at 15:00

Jan

42,290
8
54
79

Unfortunately that matches `` – doc_id Dec 22 '15 at 15:50
True. Can you give an overview of your expected input strings then? And **why** exactly is using a decent DOM parser not an option? – Jan Dec 22 '15 at 17:37
I deal with documents that might have broken or incorrectly formatted tags for an automated test process. It's a complicated scenario. But overall I decided to give up Regexp and go with DomDocument, I did not know it will handle incorrect XHTML that well. For your other question, I mentioned in the question that all I want is to detect that attribute in any format a typical browser would. – doc_id Dec 22 '15 at 18:40

score 0 · Accepted Answer · edited May 23 '17 at 11:59

0

After discussions here, and some other resources like "RegEx match open tags except XHTML self-contained tags" I realized it's impractical to use Regular expressions to accurately parse XHTML.

edited May 23 '17 at 11:59

Community

1
1

answered Jan 04 '16 at 00:35

doc_id

1,363
13
41

How to match HTML "selected" option with preg_match

2 Answers2