Generally I'd match HTML attributes with this regex
\w+=".*?"
but when the HTML contains PHP code it gets kind of dicy. Please consider the following tag:
<option value="<?php echo $img; ?>"<?php echo ($hpb[$i]['image_filename']==$img?' selected="selected"':''); ?>>
<?php echo $img; ?>
</option>
the above regex will match the attribute selected="selected"
which is determined inside PHP logic. Is there a way to match attributes which are not inside PHP tags while still matching the ones whose value may contain PHP logic? If not could I just remove the PHP code which isn't part of an attribute value?
EDIT: Here's what I have so far:
\w+="(((.(?!<\?php))*?)|((.((?=<\?php).*?(?=\?>))*)*?))*"
Which basically means match a string which starts with a SPACE then greedily match alphanumeric characters followed by EQUALS sign followed by double quote and then match any of the following two while capturing as many characters as possible:
- A sequence of characters which does not contain the string
<?php
- A sequence of characters containing the pattern
<\?php.*?\?>
or in other words greedily match the value part of the attribute with all of its PHP code All of that till a closing double quote is encountered...