I'm trying to remove single quotes and double quotes around HTML attributes with the following restrictions:
1) The quoted material MUST exist within a tag <>
(e.g., <mytag b="yes">
becomes <mytag b=yes>
, but <script>var b="yes"</script>
stays intact).
2) The quoted material may not have a space character nor an equal sign (e.g., <mytag b="no no" c="no=no">
stays intact).
3) The quoted material may not be in an href
or src
definition.
4) The regex should be good for UTF-8 (duh!)
Someone posted a virtually identical question here that received an answer that works within the confines of the question:
So:
((\S)+\s*(?<!href)(?<!src)(=)\s*)(\"|\')(\S+)(\"|\')
...works, except it fails to isolate text within tags (i.e., text in between opening and closing tags is erroneously edited, e.g. <mytag>"The quotes are stripped out here!"</mytag>
), and it doesn't check for equal signs (=) within the quoted text (e.g. <mytag b="OhNo=TheRoutineRemovedTheQuotesBecauseItDidNotCheckForAnEqualSignInTheQuotedText!">
).
Bonus points: I wish to integrate this into this php HTML minification routine, which works well except for the edits described above:
https://gist.github.com/tovic/d7b310dea3b33e4732c0
His solution pairs the patterns and replacement params in two arrays, as you'll see, so I need to conform to his syntax, which uses #
, etc.
Your solution get my upvote!