Get a Xml Tag value from regular expression in Perl

Question

I have xml in which i have tag <test>Value</test>.I want to get the value of the tag.I want to do it with Perl Regular Expression Below is my xml sample :

<?xml version="1.0"?>
<t_volume>
<test>Value</test>
<info>
<info_name>FZGA34177.b1</info_name>
<center_project>4085729</center_project>
<base_file>SETARIA_ITALICA/JGI/fasta/FZGA34177.b1.fasta</base_file>
</info>
</t_volume>

I want to get the value of this tag <test>Value</test>.I tried but i am not able to get the value .

$data = ($xml =~/<test>(.*?)<\/test>/i);

In the xml i am getting xml like also

<Test RequestId="1" RequestorId="test" ResponderId="Test">

How could i get the value of RequestorId

score 2 · Answer 1 · answered Sep 13 '16 at 08:04

Don't use regular expressions to parse XML. Use a proper XML handling tool, i.e. XML::LibXML:

#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };

use XML::LibXML;

my $dom = 'XML::LibXML'->load_xml( location => shift );

my $data = $dom->findvalue('t_volume/test');
say $data;

my $requestor_id = $dom->findvalue('//Test/@RequestorId');
say $requestor_id;

score 2 · Answer 2 · edited May 23 '17 at 12:32

Regular expressions are a bad idea for use with XML, because regular expressions are not contextual, where XML is. The problem is - that there's a bunch of semantically identical pieces of XML which can be varied legitimately and will trip up a regex. You create brittle code by doing so, because it might one day break because of an upstream (legitimate, within spec) change.

E.g.:

<root>
<Test RequestId="1" RequestorId="test" ResponderId="Test">
</Test>
</root>

Or:

<root>
  <Test RequestId="1" RequestorId="test" ResponderId="Test"></Test>
</root>

Or:

<root>
  <Test
      RequestId="1"
      RequestorId="test"
      ResponderId="Test"></Test>
</root>

Or:

<root
><Test
RequestId="1"
RequestorId="test"
ResponderId="Test"
></Test></root>

Or:

<root>
  <Test RequestId="1" RequestorId="test" ResponderId="Test"/>
</root>

These are all semantically identical, but I'm pretty sure you'd be hard pressed with a regex that safely handles all of the above (and any others that you may run into)

And additionally:

A similar match elsewhere in the document tree. (Can be many Test elements)
Altering attribute ordering/presence. (so matches don't work any more).
A <Test> element that has subelements, that because you're wildcarding, it catches those, rather than attributes.

Fortunately, you have an alternative - xpath - a way of defining an expression, that works a bit like regex, but in an XML aware way.

I would suggest XML::Twig as it doesn't have a particularly steep learning curve. For your first:

#!/usr/bin/env perl

use strict;
use warnings;

use XML::Twig;

my $twig = XML::Twig -> new -> parsefile ( 'your_file.xml' ); 

print $twig -> get_xpath('//test',0) -> text;

For your second:

print $twig -> get_xpath('//Test',0) -> att('RequestorId');

This can one-liner-ify as:

perl -MXML::Twig -0777 -e 'print XML::Twig -> parse ( <> ) -> get_xpath("//test",0) -> text' yourfile

redneb · Accepted Answer · 2016-09-13T08:09:08.647

1

The $xml =~/<test>(.*?)<\/test>/i expression can be evaluated in list context in which case it returns an array with all the captured groups. So you need to do something like that:

($data) = $xml =~/<test>(.*?)<\/test>/i;

Edit: For the second example, you can similarly extract the information if you capture it with a set of parentheses:

($RequestorId) = $xml =~ /<Test [^>]*\bRequestorId="([^"]*)"/;

edited Sep 13 '16 at 08:09

answered Sep 13 '16 at 07:53

redneb

21,794
6
42
54

if i got data in .from this i want to get the RequestorId value how we can do this – Developer Sep 13 '16 at 07:58
2

This may be correct, but honestly it's a bad solution to the problem - XML is a contextual language, and regular expressions are not. It can never be better than a hack to regex them. – Sobrique Sep 13 '16 at 08:06
@Sobrique Agreed. The OP has decent rep, so I hope they know what they are doing. – redneb Sep 13 '16 at 08:07
1

Downvoted for encouraging the parsing of XML using regexes. – Dave Cross Sep 13 '16 at 08:42

Get a Xml Tag value from regular expression in Perl

3 Answers3