0

I need to replace the contents inside all <div class="definition">text here</div> with "...", but not if any { or } is found within. I've tried this with perl, but it seems to be deleting too much, sometimes finding the first <div> and last </div>:

perl -pe 's/<div class="definition">[^{].*[^<]<\/div>/<div class="definition">...<\/div>/g'

E.g.:

This is a file <div class="definition">text here</div>.
This is a file <div class="definition">{text here}</div>. This is a file <div class="definition">text here</div>.
This is a file <div class="definition">text here</div>.

Output:

This is a file <div class="definition">...</div>.
This is a file <div class="definition">{text here}</div>. This is a file <div class="definition">...</div>.
This is a file <div class="definition">...</div>.

How can I replace any content there, but not if { or } are found inside?

Village
  • 22,513
  • 46
  • 122
  • 163
  • `How can I replace any content there, but not if { or } are found inside?`, post an example which replicate this. – Avinash Raj Oct 03 '14 at 09:57
  • 1
    If you're parsing HTML, you should really use a dedicate HTML parser. Regular expressions can cause all kinds of mischief when used on HTML. – i alarmed alien Oct 03 '14 at 10:07
  • Don't use regular expressions to manipulate HTML. See also http://radar.oreilly.com/2014/02/parsing-html-with-perl-2.html – Sinan Ünür Oct 03 '14 at 14:34
  • @ialarmedalien read http://stackoverflow.com/a/4231482/3297613 and http://meta.stackoverflow.com/q/261561/3622940 – Avinash Raj Oct 03 '14 at 14:55

2 Answers2

1

You could try the below perl command.

$ perl -pe 's/(<div class="definition">)[^{}<]+(<\/div>)/\1...\2/g' file
This is a file <div class="definition">...</div>.
This is a file <div class="definition">{text here}</div>. This is a file <div class="definition">...</div>.
This is a file <div class="definition">...</div>.
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
1

Although it's not quite a one-liner, it is easy to do the task you want with some Mojo::DOM magic. Here is the code:

#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
use feature ':5.10';

use Mojo::DOM;

my $html = 'This is a file <div class="definition">text here</div>.
This is a file <div class="definition">{text here}</div>. This is a file <div class="definition">text here</div>.
This is a file <div class="definition">text here</div>.';

my $dom = Mojo::DOM->new( $html );

$dom->find( 'div.definition' )->grep(sub { $_->text =~ m#^[^\{]# })->replace('<div class="definition">...</div>');

say $dom;

Output:

This is a file <div class="definition">...</div>.
This is a file <div class="definition">{text here}</div>. This is a file <div class="definition">...</div>.
This is a file <div class="definition">...</div>.

To explain what's going on:

# this finds all div nodes with class definition
$dom->find( 'div.definition' )

# then filter the collection of nodes by the
->grep(sub { $_->text =~ m#^[^\{]# })

# replace those nodes with '<div class="definition">...</div>'
->replace('<div class="definition">...</div>');
i alarmed alien
  • 9,412
  • 3
  • 27
  • 40