Grab values between delimiters in Perl from chomped line

Question

I am trying to grab the values between two delimiters in Perl using regex. I am opening a file and using chomp to go through the file line by line. Example of how the file looks:

"This is <tag> an </tag> example
of the <tag> file </tag> that I
am <tag> trying </tag> to <tag> parse </tag>"

I am able to get the first couple of words: "an", "file", but on the third line I can only get "trying" and not "parse". This is the code I am trying to use:

while (chomp($line = <$filename>)){
   ($tag) = $line =~ m/<tag>(.*?)<\/tag>/;
   push(@tagarray, $tag);
}

I suspect this has something to do with chomp but don't see how to parse the file differently.

I normally use [HTML::TreeBuilder](http://search.cpan.org/~kentnl/HTML-Tree-5.07/lib/HTML/TreeBuilder.pm) (for HTML) — zdim, Nov 07 '17 at 17:11
If you're processing HTML or XML then you should use a library specifically for that purpose, rather than trying to create your own using regex patterns. — Borodin, Nov 07 '17 at 18:32

mwp · Answer 1 · 2017-11-07T17:13:16.830

You need to modify the regex to grab multiple matches:

my @tags = $line =~ m/<tag>(.*?)<\/tag>/g;

You may be better off using an HTML parser to perform this operation. Parsing HTML with regular expressions is fraught with peril. For example, take a look at HTML::TagParser:

my $html = HTML::TagParser->new(<<'EOF');
This is <tag> an </tag> example
of the <tag> file </tag> that I
am <tag> trying </tag> to <tag> parse </tag>
EOF

my @tags = $html->getElementsByTagName('tag');
my @tagarray = map { $_->innerText() } @tags;

score 7 · Accepted Answer · answered Nov 07 '17 at 17:01

I suspect this has something to do with chomp

No. It is because you are capturing only one value and assigning it to a scalar.

Make the regex global (/g) and store the results in an array.

#!/usr/bin/env perl

use strict;
use warnings;
use v5.10;

my $line = "am <tag> trying </tag> to <tag> parse </tag>";
my @tags;
(@tags) = $line =~ m/<tag>(.*?)<\/tag>/g;
say join ",", @tags;

Grab values between delimiters in Perl from chomped line

2 Answers2