I'm trying to analyze HTML code and extract all CSS classes and ID's from the source. So I need to extract whatever is between two quotation marks, which can be preceded by either class or id:
id="<extract this>"
class="<extract this>"
I'm trying to analyze HTML code and extract all CSS classes and ID's from the source. So I need to extract whatever is between two quotation marks, which can be preceded by either class or id:
id="<extract this>"
class="<extract this>"
/(?:id|class)="([^"]*)"/gi
replacement expression: $1
this regex in english: match either "id" or "class" then an equals sign and quote, then capture everything that is not a quote before matching another quote. do this globally and case insensitively.
Since you prefer using regular expression, here is one way I suppose.
\b(?:id|class)\s*=\s*"([^"]*)"
Regular expression:
\b # the boundary between a word char (\w) and not a word char
(?: # group, but do not capture:
id # 'id'
| # OR
class # 'class'
) # end of grouping
\s* # whitespace (\n, \r, \t, \f, and " ") (0 or more times)
= # '='
\s* # whitespace (\n, \r, \t, \f, and " ") (0 or more times)
" # '"'
( # group and capture to \1:
[^"]* # any character except: '"' (0 or more times)
) # end of \1
" # '"'
You may want to try this:
<?php
$css = <<< EOF
id="<extract this>"
class="<extract this>"id="<extract this2>"
class="<extract this3>"id="<extract this4>"
class="<extract this5>"id="<extract this6>"
class="<extract this7>"id="<extract this8>"
class="<extract this9>"
EOF;
preg_match_all('/(?:id|class)="(.*?)"/sim', $css , $classes, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($classes[1]); $i++) {
echo $classes[1][$i]."\n";
}
/*
<extract this>
<extract this>
<extract this2>
<extract this3>
<extract this4>
<extract this5>
<extract this6>
<extract this7>
<extract this8>
<extract this9>
*/
?>
DEMO:
http://ideone.com/Nr9FPt