I have the below regex to parse XML tags inside html code blocks, I can not use xml libs, and it is working with some tests as expected, I just need some experts to optimize it if needed because I will use it to parse many blocks of code to build the whole template so it may be run average 50 times for each template and therefore every clock tick will count for me.
The regex for the XML tags I used is
(<vars\s*([^\!\?\s<>](?:"[^"]*"|'[^']*'|[^"'<>])*)>([^<]*)(<\!\[CDATA\[(.*?)\]\]>)?(</vars>)?)
then I parse the attributes with this regex:
([^\s\=\"\']+)\s*=\s*(?:(")(.*?)"|'(.*?)')
here is the test Perl code:
use strict;
use warnings;
no warnings 'uninitialized';
my $text = <<"END_HTML";
<vars type="var" name="selfopened" content="tag without closing slash" size="30" width="200px" >
<vars type="plug" name="selfclosed" content="self closed tag" size="30" width="200px" />
<vars type="var" name="hasclosing" width="400px" height="300px">content of tag with closing</vars>
<vars id="left-part" width="400px" height="300px"><![CDATA[
cdata start here is may have html tags and 'single' and "double" qoutes
another cdata line
]]></vars>
<vars name="singlelinecdata" width="400px" height="300px"><![CDATA[cdata start here is may have html tags and 'single' and "double" qoutes]]></vars>
</vars>
END_HTML
while ( $text =~ m{
(<vars\s*([^\!\?\s<>](?:"[^"]*"|'[^']*'|[^"'<>])*)>([^<]*)(<\!\[CDATA\[(.*?)\]\]>)?(</vars>)?)
}sxgi ) {
my ($match, $attrs, $value, $cdata, $cdata_content, $closing) = ( $1, $2, $3, $4, $5, $6 );
print "match: $match, attrs: $attrs, value: $value, cdata: $cdata, closing: $closing\n\n";
# parse attributes to key, value pairs
while ( $attrs =~ m{
([^\s\=\"\']+)\s*=\s*(?:(")(.*?)"|'(.*?)')
}sxg ) {
my $key = $1;
my $val = ( $2 ? $3 : $4 );
print "attr: $key=$val\n";
}
print "\n";
}