Here's a (mostly) equivalent Perl script to the oneliner perl -p -i -e 's/\n\n/\n/g' *.xml
(one main difference being that this has strict
and warnings
enabled, which is strongly recommended), which you could expand upon by putting more code to modify the current line in the body of the while
loop.
#!/usr/bin/env perl
use warnings;
use strict;
if (!@ARGV) { # if no files on command line
@ARGV = glob('*.xml'); # get a default list of files
}
local $^I = ''; # enable inplace editing (like perl -i)
while (<>) { # read each line of each file into $_
s/\n\n/\n/g; # modify $_ with a regex
# more regexes here...
print; # write the line $_ back out
}
You can save this script in a file such as process.pl
, and then run it with perl process.pl
, or do chmod u+x process.pl
and then run it via ./process.pl
.
On the other hand, you really shouldn't modify XML files with regular expressions, there are lots of Perl modules to do XML processing - I wrote about that some more here. Also, in the example you showed, s/\n\n/\n/g
actually won't have any effect, since when reading files line-by-line, no string will contain two \n
's (you can change how Perl reads files, but I don't see any mention of that in the question).
Edit: You've named the script in your example unicode.sh
- if you're processing Unicode files, then Perl has very powerful features to help with that, although the code won't necessarily end up as nice and short as I've showed above. You'll have to tell us some more about what you're doing, and show some example input and output, to get suggestions about that. See also e.g. perlunitut
.