How can I delete characters between < and > in Perl?

Question

I need to write a Perl script to read in a file, and delete anything inside < >, even if they're on different lines. That is, if the input is:

Hello, world. I <enjoy eating
bagels. They are quite tasty.
I prefer when I ate a bagel to
when I >ate a sandwich. <I also
like >bananas.

I want the output to be:

Hello, world. I ate a sandwich. bananas.

I know how to do this if the text is on 1 line with a regex. But I don't know how to do it with multiple lines. Ultimately I need to be able to conditionally delete parts of a template so I can generate parametrized files for config files. I thought perl would be a good language but I am still getting the hang of it.

Edit: Also need more than 1 instance of <>

score 6 · Answer 1 · answered Apr 10 '09 at 14:24

6

You may want to check out a Perl module Text::Balanced, part of the core distribution. I think it'll be of help for you. Generally, one wants to avoid regexes to do that sort of thing IF the subject text is likely to have an inner set of delimiters, it can get very messy.

answered Apr 10 '09 at 14:24

Danny

13,194
4
31
36

Good advice, but not needed in this case. Will definitely keep in mind though. – rlbond Apr 10 '09 at 20:55

score 6 · Answer 2 · edited Apr 10 '09 at 15:05

6

In Perl:

#! /usr/bin/perl   
use strict;

my $text = <>;
$text =~ s/<[^>]*>//g;
print $text;

The regex substitutes anything starting with a < through the first > (inclusive) and replaces it with nothing. The g is global (more than once).

EDIT: incorporated comments from Hynek and chaos

edited Apr 10 '09 at 15:05

brian d foy

129,424
31
207
592

answered Apr 10 '09 at 14:28

OtherDevOpsGene

7,302
2
31
46

It's little bit ineffective. To split it and join again. perl -0777 -pe 's/<[^>]*>//gm' – Hynek -Pichi- Vychodil Apr 10 '09 at 14:38
the /m modifier isn't helping. It means 'treat as multiline', i.e. match ^ and $ at newlines, not 'this is multiline'. /s, treat as single line, is actually more what you'd want, but you don't need it because your pattern isn't concerned with whitespace. – chaos Apr 10 '09 at 14:46
1

I would put both angle brackets in the negated character class: s/<[^<>]*>//g. Otherwise, you could match from , which probably isn't what you want. – Alan Moore Apr 10 '09 at 18:16
Very useful. Chaos's answer, however, is more adaptable towards multi-character delimiters, I.E. using . and /s rather than [^(delimiter)] +1 for great advice though. – rlbond Apr 10 '09 at 20:56

score 4 · Accepted Answer · answered Apr 10 '09 at 14:51

4

local $/;
my $text = <>;
s/<.*?>//gs;
print $text;

answered Apr 10 '09 at 14:51

chaos

122,029
33
303
309

If your string looks like this: ghi>, your regex leaves 'ghi>'. If nested or escaped brackets and other perverse cases "never happen" the regex is fine. To handle the perverse cases, use Text::Balanced, even though the interface is weird. – daotoad Apr 10 '09 at 16:26

Hynek -Pichi- Vychodil · Answer 4 · 2009-04-10T15:34:36.563

1

Ineffective one-liner way

perl -0777 -pe 's/<.*?>//gs'

same as program

local $/;
my $text = <>;
s/<.*?>//gs;
print $text;

It depends how big text you want convert here is more effective one-liner consuming line by line

perl -pe 'if ($a) {(s/.*?>// and do {s/<.*?>//g; $a = s/<.*//s;1}) or $_=q{}} else {s/<.*?>//g; $a = s/<.*//s}'

same as program

my $a;
while (<>) {
    if ($a) {
        if (s/.*?>//) {
            s/<.*?>//g;
            $a = s/<.*//s;
        }
        else { $_ = q{} }
    }
    else {
        s/<.*?>//g;
        $a = s/<.*//s;
    }
    print;
}

edited Apr 10 '09 at 15:34

answered Apr 10 '09 at 14:40

Hynek -Pichi- Vychodil

26,174
5
52
73

As noted re CoverosGene's answer, /m isn't necessary or helpful. – chaos Apr 10 '09 at 14:48

How can I delete characters between < and > in Perl?

4 Answers4