0

I have string input like following:

<Name>IncludeLeafPortfolios</Name><DataType>Boolean</DataType><Value>True</Value>
<Name>HierarchyDate</Name><DataType>Int</DataType><IsFixed>false</IsFixed>
<Name>HierarchyDate</Name><DataType>Int</DataType>
<Name>HierarchyDate</Name><DataType>Int</DataType><Value>0</Value><IsFixed>false</IsFixed>
<Name>HierarchyDate</Name><DataType>Int</DataType><Value>0</Value><IsFixed>false</IsFixed>

Name tag always exist and is of interest. DataType is not of interest. Value tag and IsFixed tag may or may not exist. The goal is to capture Value tag, IsFixed tag if one of them exists or both exist.

My solution is not working:

$element =~ m/^<Name>([\w\s]*)<\/Name>.*([<Value>[\w+\d+]<\/Value>]?)(<IsFixed>[\w+]<\/IsFixed>])?$

Please suggest. Thanks.

LOUDKING
  • 301
  • 1
  • 13
  • you can check if at least one of them exists, but can only capture one of them if both exist. if($element =~ m'.*|.*'i){} if you put parentheses arount .*, you'd only get the value of the first one if both tags exist due to shortcut evaluation. – Shiping Mar 14 '17 at 03:23
  • Are you sure that's _exactly_ how your XML looks? It's looks a bit oddly structured. (e.g. I'd expect 'parent' nodes of name elements). – Sobrique Mar 14 '17 at 09:33

2 Answers2

1

That data looks like XML. Parse it using a library like XML::LibXML, then perform operations on the resulting structure.

Do not use regular expressions to process XML. The results are just as bad as trying to use regular expressions for HTML.

Community
  • 1
  • 1
0

XML is context sensitive. Regular expressions are not. You cannot reliably parse XML with regular expressions for this reason.

So use a parser. I like XML::Twig, and it would go a bit like this:

#!/usr/bin/env perl
use strict;
use warnings;

use XML::Twig;
use Data::Dumper;

my $twig = XML::Twig -> new -> parsefile ( 'your_file.xml' );

my @keys = qw ( Name Value IsFixed ); 

my @rows; 
my %current_row; 
#iterate children
foreach my $node ( $twig -> root -> children ) { 
   #extract tag and content
   my $tag = $node -> tag;
   my $content = $node -> text; 
   $current_row{$tag} = $content; 
   #if it's a name tag, assume it's a new row. 
   if ($tag eq 'Name' and %current_row) {
       push @rows, {%current_row};
       undef %current_row;
   }
}
#output results. 
print join ",", @keys, "\n";
foreach my $row ( @rows ) {
   print join ",", (map { $row -> {$_} // '' } @keys),"\n";
}

Which outputs:

Name,Value,IsFixed,
IncludeLeafPortfolios,,,
HierarchyDate,True,,
HierarchyDate,,false,
HierarchyDate,,,
HierarchyDate,0,false,

I would, however, note that your XML is messy - are you sure that's how it's structured? Because normally if you've got 'associated' tags, then they're grouped within a node.

e.g. something like:

<xml>
  <item>
     <Name>HierarchyDate</Name><DataType>Int</DataType><IsFixed>false</IsFixed>
  </item>
</xml>

Which would greatly simplify the problem, because you could:

foreach my $item ( $twig -> root -> children ) {
   print join ",", (map { $item -> first_child_text($_) // '' } @keys),"\n"; 
}
Sobrique
  • 52,974
  • 7
  • 60
  • 101