0

I tried

/(^<table)(.*?)($>)/

It should match everything that is between <> for the table tag, but it does not

Alex
  • 355
  • 7
  • 25
  • 2
    I think you meant: `/(^$)/`, but it is unlikely that there will be no whitespace before or after your `
    ` tag.
    – Hunter McMillen Jun 26 '14 at 16:27
  • an example would be better. – Avinash Raj Jun 26 '14 at 16:27
  • 3
    Potential [`XY Problem`](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem): Please explain in detail exactly what your goal is with example data, or we are unlikely to be able to advise you. – Miller Jun 26 '14 at 17:04
  • My goal is to remove all tags from a table, and leave only plain text. So I need to match everything within tags – Alex Jun 26 '14 at 17:15
  • 1
    @Alex, Then this is almost certainly an XY Problem. Where are you getting this table from? Are you downloading it from the internet? what method are you using to obtain it? (yes, there are 3 questions there). – Miller Jun 26 '14 at 17:56
  • 1
    Don't use regular expressions to parse HTML! See [HTML::TreeBuilder](https://metacpan.org/pod/HTML::TreeBuilder) for example. – Kaoru Jun 26 '14 at 18:57

1 Answers1

0

As mentioned in the comments on this question, it's not really practical to parse HTML with regular expressions.

Here's an example using Mojo::DOM, inspired by this StackOverflow answer:

#!/usr/bin/env perl

use strict ;
use warnings ;

use Mojo::DOM ;

my $html = <<EOHTML;
<!DOCTYPE html>
<html>
<head>
<title>Sample HTML with a table</title>
</head>
<body>
     <table border>
        <tr> <td>a</td> <td>b</td> <td>c</td> </tr>
        <tr> <td>1</td> <td>2</td> <td>3</td> </tr>
     </table>
</body>
</html>
EOHTML

my $dom = Mojo::DOM->new ;

$dom->parse( $html ) ;

for my $div ( $dom->find( 'td' )->each ) {

    print $div->all_text . "\n" ;

}

The output is:

a
b
c
1
2
3
Community
  • 1
  • 1
GJoe
  • 185
  • 7