You have a very rigid regex to find the XML header. What if there are extra spaces? What if the encoding is different, or the xml version? Regex is not the right tool for parsing XML/HTML (see this answer), however it is understandable why you would want to use regex, especially given the limited scope of what you are trying to do.
That being said, if you are going for simplicity, and you are willing to be open to some possible failures, I would opt for a simpler regex and only do the replacement once:
my $replaced = 0;
if ($inputline =~ m/\<\?xml\b.*\>/ && !$replaced) {
print OUTPUT $inputline;
print OUTPUT '<?xml-stylesheet type="text/xsl" href="askaway_transcript_stylesheet.xsl"?>'."\n";
$replaced = 1;
}
Alternately, you could exit your parse loop, assuming that is all you are doing in the loop.
Caveat:
- If your XML is all written on one line, or even if there is another tag on the same line (which is legal), this will most likely break your XML.
Edit:
Your entire while
loop would probably look like this:
while($inputline = <MYXML>) {
my $replaced = 0;
if ($inputline =~ m/\<\?xml\b.*\>/ && !$replaced) {
print OUTPUT $inputline;
print OUTPUT '<?xml-stylesheet type="text/xsl" href="askaway_transcript_stylesheet.xsl"?>'."\n";
$replaced = 1;
} else {
print OUTPUT $inputline;
}
}
Or:
while($inputline = <MYXML>) {
my $replaced = 0;
print OUTPUT $inputline;
if ($inputline =~ m/\<\?xml\b.*\>/ && !$replaced) {
print OUTPUT '<?xml-stylesheet type="text/xsl" href="askaway_transcript_stylesheet.xsl"?>'."\n";
$replaced = 1;
}
}