How to match a table tag in Perl regex?

Question

I tried

/(^<table)(.*?)($>)/

It should match everything that is between <> for the table tag, but it does not

I think you meant: `/(^$)/`, but it is unlikely that there will be no whitespace before or after your `
` tag. — Hunter McMillen, Jun 26 '14 at 16:27
Potential [`XY Problem`](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem): Please explain in detail exactly what your goal is with example data, or we are unlikely to be able to advise you. — Miller, Jun 26 '14 at 17:04
My goal is to remove all tags from a table, and leave only plain text. So I need to match everything within tags — Alex, Jun 26 '14 at 17:15
@Alex, Then this is almost certainly an XY Problem. Where are you getting this table from? Are you downloading it from the internet? what method are you using to obtain it? (yes, there are 3 questions there). — Miller, Jun 26 '14 at 17:56
Don't use regular expressions to parse HTML! See [HTML::TreeBuilder](https://metacpan.org/pod/HTML::TreeBuilder) for example. — Kaoru, Jun 26 '14 at 18:57

score 0 · Accepted Answer · edited May 23 '17 at 12:05

As mentioned in the comments on this question, it's not really practical to parse HTML with regular expressions.

Here's an example using Mojo::DOM, inspired by this StackOverflow answer:

#!/usr/bin/env perl

use strict ;
use warnings ;

use Mojo::DOM ;

my $html = <<EOHTML;
<!DOCTYPE html>
<html>
<head>
<title>Sample HTML with a table</title>
</head>
<body>
     <table border>
        <tr> <td>a</td> <td>b</td> <td>c</td> </tr>
        <tr> <td>1</td> <td>2</td> <td>3</td> </tr>
     </table>
</body>
</html>
EOHTML

my $dom = Mojo::DOM->new ;

$dom->parse( $html ) ;

for my $div ( $dom->find( 'td' )->each ) {

    print $div->all_text . "\n" ;

}

The output is:

a
b
c
1
2
3

How to match a table tag in Perl regex?

1 Answers1