3

I'm fairly new to the whole coding game, and am very grateful for every answer!

I am working on a directory with many .txt files in them and have a file with looong list of regex like "perl -p -i -e 's/\n\n/\n/g' *.xml" they all work if I copy them to terminal. But is there a possibility to run them straight from the file? I tried ./unicode.sh but that resulted in:

No such file or directory.

Any ideas?

Thank you so much!

haukex
  • 2,973
  • 9
  • 21

2 Answers2

3

Here's a (mostly) equivalent Perl script to the oneliner perl -p -i -e 's/\n\n/\n/g' *.xml (one main difference being that this has strict and warnings enabled, which is strongly recommended), which you could expand upon by putting more code to modify the current line in the body of the while loop.

#!/usr/bin/env perl
use warnings;
use strict;

if (!@ARGV) {               # if no files on command line
    @ARGV = glob('*.xml');  # get a default list of files
}
local $^I = '';             # enable inplace editing (like perl -i)
while (<>) {                # read each line of each file into $_
    s/\n\n/\n/g;            # modify $_ with a regex
    # more regexes here...
    print;                  # write the line $_ back out
}

You can save this script in a file such as process.pl, and then run it with perl process.pl, or do chmod u+x process.pl and then run it via ./process.pl.

On the other hand, you really shouldn't modify XML files with regular expressions, there are lots of Perl modules to do XML processing - I wrote about that some more here. Also, in the example you showed, s/\n\n/\n/g actually won't have any effect, since when reading files line-by-line, no string will contain two \n's (you can change how Perl reads files, but I don't see any mention of that in the question).

Edit: You've named the script in your example unicode.sh - if you're processing Unicode files, then Perl has very powerful features to help with that, although the code won't necessarily end up as nice and short as I've showed above. You'll have to tell us some more about what you're doing, and show some example input and output, to get suggestions about that. See also e.g. perlunitut.

haukex
  • 2,973
  • 9
  • 21
  • Thank you. I have huge text files (all in all about 20 GB) that contain symbols as well as some emojis. I am now replacing those symbols and emojis (about 2000) with a number e.g. would be 1F600 I have a table for those replacements but find it rather hard to find a quick and easy way, that's how I came up with the regex - easy but very slow... – somanyquestions Jan 23 '19 at 10:30
  • @somanyquestions Well, you could share your code (a representative [MCVE](https://stackoverflow.com/help/mcve) with sample input and output) and you might get some tips on optimization. Note that if you're just replacing Unicode characters with their codes, and you've opened the files with the correct encoding layer, then you don't need a table, for example: `s/(\P{ASCII})/sprintf("%02X",ord $1)/eg` (but if this is XML, then I'd still be very careful with this, as I mentioned in the answer). – haukex Jan 23 '19 at 14:46
2

It's likely if you got no such file or directory, your problem was you forgot to make unicode.sh executable, as in chmod +x unicode.sh, assuming that's a script that you wrote.

Of course the normal way to run multiple perl commands is this thing that looks like runme.pl which you write, i.e., a perl script.

That said, yes, everything will work from the terminal, you just need to be careful about escaping that bash performs.

Dean C Wills
  • 176
  • 6
  • 1
    Thanks Dean! Beginner's mistake I guess :/ after that I got "env: bash\r: No such file or directory" but could get rid of it easily with sed $'s/\r$//' now it's running smoothly, while I am trying to finding out how I ended up with \r... Thanks again for the fast help! – somanyquestions Jan 21 '19 at 17:22
  • There is no reason it shouldn't work, as you say sed is a better bet here. You need to set the bash shell #!/usr/bin/bash or equivalent – M__ Jan 22 '19 at 12:28