I want to get all contents of a section tag in an HTML string using perl. I'm using the following line of code, but it doesn't seem to work:
$article_content =~ s/^.*?<section>(.*)<\/section>.*?$/$1/;
I want to get all contents of a section tag in an HTML string using perl. I'm using the following line of code, but it doesn't seem to work:
$article_content =~ s/^.*?<section>(.*)<\/section>.*?$/$1/;
Change (.*)
to (.*?)
and see if that helps.
Don't use regular expressions to parse HTML. You cannot reliably parse HTML with regular expressions. As soon as the HTML changes from your expectations, your code will be broken. See http://htmlparsing.com/perl.html for examples of how to properly parse HTML with Perl modules.
The first problem is that you assume .
matches any character, but that's only the case when using /s
.