0

I have an array of elements which are basically HTML tags. Below, is an example

<L>
 <LI>
  <LI_Label>Label1</LI_Label>
  <LI_Title>Title1</LI_Title>  
 </LI>
  <LI>
  <LI_Label>Label2</LI_Label>
  <LI_Title>Title2</LI_Title>  
 </LI>
 <LI>
  <LI_Label>Label3</LI_Label>
  <LI_Title>Title3</LI_Title>  
 </LI>
</L>

I am trying to extract only the LI_Title elements and store them into a separate array which I want to then concatenate into 1 complete string. For extract and store, I am using the below script. However, when I print the array, the entire block of HTML is in the Found_LI array and not just the LI_Title elements as I am expecting. Hoping someone here can point out what i am doing wrong below?

foreach (@po_siblings)
{
    if ($_ =~ /LI_Title/)
    {
        push(@found_LI,$_);
    }
}
print "@found_LI\n";
ruakh
  • 175,680
  • 26
  • 273
  • 307
BRZ
  • 695
  • 4
  • 13
  • 25
  • The problem is that `@po_siblings` does not contain what you think it does. You think it's an array with one element per line, but its elements are actually bigger than that. (Maybe even the whole thing is just a single element?) – ruakh Jul 07 '13 at 16:32
  • And the simplest fix is to replace your entire `if` statement with something like `push @found_LI, m/.*?<\/LI_Title>/g`. – ruakh Jul 07 '13 at 16:33
  • 1
    I suggest using an HTML parser. http://stackoverflow.com/questions/4598162/html-parsing-in-perl – Mike Clark Jul 07 '13 at 16:47

1 Answers1

1

As your sample "html" is in fact well-formed XML — why not use an XML parser and find the nodes and values using XPath queries? Here's a sample script to solve your problem using XML::LibXML:

use strict;
use XML::LibXML;

my $blob = <<'EOF';
<L>
 <LI>
  <LI_Label>Label1</LI_Label>
  <LI_Title>Title1</LI_Title>  
 </LI>
  <LI>
  <LI_Label>Label2</LI_Label>
  <LI_Title>Title2</LI_Title>  
 </LI>
 <LI>
  <LI_Label>Label3</LI_Label>
  <LI_Title>Title3</LI_Title>  
 </LI>
</L>
EOF

my $p = XML::LibXML->new;
my $doc = $p->parse_string($blob);
print join(" ", map { $_->textContent } $doc->findnodes('/L/LI/LI_Title')), "\n";
Slaven Rezic
  • 4,571
  • 14
  • 12