Regular expressions are a bad idea for use with XML, because regular expressions are not contextual, where XML is. The problem is - that there's a bunch of semantically identical pieces of XML
which can be varied legitimately and will trip up a regex
. You create brittle code by doing so, because it might one day break because of an upstream (legitimate, within spec) change.
E.g.:
<root>
<Test RequestId="1" RequestorId="test" ResponderId="Test">
</Test>
</root>
Or:
<root>
<Test RequestId="1" RequestorId="test" ResponderId="Test"></Test>
</root>
Or:
<root>
<Test
RequestId="1"
RequestorId="test"
ResponderId="Test"></Test>
</root>
Or:
<root
><Test
RequestId="1"
RequestorId="test"
ResponderId="Test"
></Test></root>
Or:
<root>
<Test RequestId="1" RequestorId="test" ResponderId="Test"/>
</root>
These are all semantically identical, but I'm pretty sure you'd be hard pressed with a regex
that safely handles all of the above (and any others that you may run into)
And additionally:
- A similar match elsewhere in the document tree. (Can be many
Test
elements)
- Altering attribute ordering/presence. (so matches don't work any more).
- A
<Test>
element that has subelements, that because you're wildcarding, it catches those, rather than attributes.
Fortunately, you have an alternative - xpath
- a way of defining an expression, that works a bit like regex
, but in an XML
aware way.
I would suggest XML::Twig
as it doesn't have a particularly steep learning curve. For your first:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
my $twig = XML::Twig -> new -> parsefile ( 'your_file.xml' );
print $twig -> get_xpath('//test',0) -> text;
For your second:
print $twig -> get_xpath('//Test',0) -> att('RequestorId');
This can one-liner-ify as:
perl -MXML::Twig -0777 -e 'print XML::Twig -> parse ( <> ) -> get_xpath("//test",0) -> text' yourfile