Required link: RegEx match open tags except XHTML self-contained tags
In English, what it does is this:
<
matches a HTML open tag
\s*
matches any amount of whitespace (tabs, spaces, newlines)
(?
is something to not worry about - it's a subgroup but it doesn't store the value
The next lump is possible values for open tags - applet
, embed
, etc
The ()
around the values mean "store this value in a subpattern, and make it available as
part of my results
The |
means "or", so applet
or embed
, etc - this looks at tag names
\s*
more whitespace
.?
means "any amount of anything", except for newlines but because of the SingleLine
flag (see comments for this answer) is matches "any amount of anything"
(?
again, see above, same for the optional values (src, href) - these are the tag
attributes
\s=\s*
means "a space, followed by an equals sign, followed by any amount of whitespace"
([\\"\\'])
the ()
, see above. The []
mean "any of these characters, in any order", and the \\"
and \\'
are the " and ' characters, escaped with backslashes
(?.?)
we already know (?
, and the .?
means "optionally, a single one of any character"
The options at the end are modifiers, they make the regex match more things - IgnoreCase makes it case insensitive, Singleline should be obvious, and someone else will tell you what Compiled means, because I don't know the language the regex is written for :)
Edit: You've just updated the first post a little. The <Tag>
and <AttributeName>
give the match groups a name, so for example, your result of running the regex might look like this:
Array
- Tag = img
- AttributeName = src
- FileOrImage = http://www.mysite.com/a.png
By the way, congratulations on having an awesome name :D