sed newline and carriage return pattern capture

Question

I have the following block of text (with \r\n or \n) and I would like to find and delete it with sed.

<?php
/*
*/
?>

I've tried many embarrassing things (based on many SE answers) to remove this which have failed miserably so rather than muddy the waters what is the right way to capture and delete this pattern? Using two separate sed commands for either \n or \r\n is fine too.

Ok, I'll share two poor attempts:

sed 'N;s/<\?php\r\n\/\*\r\n\*\/\r\n\?>//g' file.txt

sed ':a;N;$!ba;s/<\?php\r\n\/\*\r\n\*\/\r\n\?>//g' file.txt

EDIT: Based on the answer below I tried to put this into a PERL recursion routine that searchers for .php files and modifies them. However $text ends up undefined. The error is "Use of uninitialized value $text in print at [line "print $text"]"

Sorry, I've not used perl before...

#!/usr/bin/perl

use strict;
use warnings;

my $parent_dir = ".";
my $dir="";
my $file="";
process_dir($parent_dir);

sub process_dir {
        my $dir = shift;
        print "Processing $dir\n";
        opendir(my $SCR , $dir) or die "Can't open $dir: $!";
        while( defined (my $file = readdir $SCR) )
        {
           next if ($file =~ /\.$/ );
           if ( $file =~ /\.php$/ ) {
             &process_file();
           } elsif ( -d "$dir/$file" ) {
             print "directory : $dir/$file\n";
             process_dir("$dir/$file/");
             #next;
           #} elsif ( $file
           } else {
                print "Else :$file\n" if ( -B "$dir/$file");
           }
           print "file -> $file\n";
        }
    closedir($SCR);
}

sub process_file{

    my $text="";
    open(my $fh, '<', "$dir/$file") or die "cannot open file $file";
    {
        local $/;
        $text = <$fh>;
    }
    close($fh); 

    print "Before:\n";

    print $text;

    $text =~ s{ <\?php \s* \r?\n \s* /\* \s* \r?\n \s* \*/ \s* \r?\n \?> \s* \r?\n }{}gmx;

    print "After:\n";
    print $text;
}

Ouch...does it have to be `sed`? I don't think it is the tool I'd use for this job. I'd probably use `perl`, but it isn't particularly pretty even then. How big is the file? Presumably not all that big (more likely kilobytes than megabytes), so slurp and process is likely appropriate. — Jonathan Leffler, Aug 03 '13 at 19:47
Some files are 20-30Kb. But there are 100's so you see why I need some sort of tool to recurse through them all. Perl would be fine, but I don't know it very well. — user6972, Aug 03 '13 at 19:54
These days, the files would have to be multiple megabytes (maybe even gigabytes) before deciding that slurping the whole file into memory is a bad idea. While they're still measured in kilobytes, they're easy to manage — no machine is going to break sweat handling it. — Jonathan Leffler, Aug 03 '13 at 20:03

potong · Answer 1 · 2013-08-04T08:41:36.140

This might work for you (GNU sed):

sed ':a;$!{N;ba};s/\n\?<?php\r\?\n\/\*\r\?\n\*\/\r\?\n?>//g' file

This slurps the whole file into the pattern space then deletes the required string.

The regexp makes use of the \? which means expect 1 or zero of the proceeding pattern (in the general case \r or in the very first case \n.

White space may be an unseen problem, in which case:

sed ':a;$!{N;ba};s/\n\?\s*<?php\s*\r\?\n\s*\/\*\s*\r\?\n\s*\*\/\s*\r\?\n\s*?>//g' file

Jonathan Leffler · Accepted Answer · 2013-08-03T21:41:21.030

Basic Perl script

I'd probably use Perl for this job. Assuming the file is small enough that slurping the whole file into memory is a reasonable strategy, then this code seems to do the job:

#!/usr/bin/env perl
use strict;
use warnings;

my $text;
{
local $/;
$text = <>;
}

print "Before:\n";
print $text;

$text =~ s{ <\?php \s* \r?\n \s* /\* \s* \r?\n \s* \*/ \s* \r?\n \?> \s* \r?\n }{}gmx;

print "After:\n";
print $text;

The first three lines a standard startup code. The next five read the whole file into the variable $text. The printing lines are self-explanatory. The substitute command is where all the fun is.

The pattern is between the first {} pair; the replacement text is between the second pair {}. The qualifiers at the end repeat the substitution (g), across newlines (m), using extended notation (x) so that spaces in the regex are not significant.

The match pattern looks for <?php followed by zero or more spaces (\s*), optionally a carriage return (\r?), and a newline (\n). The spaces, carriage return and newline pattern appears 4 times, once for each line ending in the pattern you want to match. The other parts match zero or more spaces before /*, zero or more spaces before */, and zero or more spaces before ?>, remembering that * and ? are special characters and must be escaped to match them literally.

Sample output

Before:

aasdasdsa
sdasdsada
<?php
/*
*/
?>
sdasdasda
asdsdasas

After:

aasdasdsa
sdasdsada
sdasdasda
asdsdasas

Recursive code

#!/usr/bin/env perl
use strict;
use warnings;
use File::Find;

find({ wanted => \&process_file, no_chdir => 1 }, @ARGV);

sub process_file
{
    my $name = $_;
    return unless -f $name;
    print "$name\n";
    open my $fh, '+<', $name or die "Failed to open file $name for reading and writing";
    my $text;
    {
    local $/;
    $text = <$fh>;
    }
    $text =~ s{ <\?php \s* \r?\n \s* /\* \s* \r?\n \s* \*/ \s* \r?\n \?> \s* \r?\n }{}gmx;
    seek $fh, 0, 0;
    truncate $fh, 0;
    print $fh $text;
    close $fh;
}

Error handling leaves much to be desired; the die should probably be replaced by print (to standard error) and return.

Thanks! Not knowing perl well how would I modify this to recurse through a file structure using a pattern? I couldn't get this http://stackoverflow.com/questions/5241692/perl-script-to-recursively-list-all-filename-in-directroy to work — user6972, Aug 03 '13 at 21:06
I think I'm close if you could take a look at my mistake in my edited question it would really help me. — user6972, Aug 03 '13 at 21:41

sed newline and carriage return pattern capture

2 Answers2

Basic Perl script

Sample output

Recursive code

Linked