0

I'm creating several lexicons for a word recognition program containing only the x first sounds of a word (henceforth ngram). Therefore, I extract the needed words from an existing lexicon. However, I would like to do this automatically, i.e. find all words of a ngram (e.g. ngram = 3), save them, increase ngram (= 4) and repeat the process. The code looks like this:

ngrams=$(seq 3 1 9)
for ngram in $ngrams
do

cat /Lexicon/whole_lexicon.lex | perl -ne 'chomp; @tok = split(/\s+/); $ntoprint = $#tok; if ($ngram < $ntoprint) {$ntoprint = $ngram}; for ($i = 1; $i <= $ntoprint; $i++) {printf("%s\t%s\n", join("", @tok[1..$i]), join(" ", @tok[1..$i])); }' > lexicons/lex$ngram.txt

done

Unfortunately, the value $ngram is not recognised by perl and the command is not working properly. For comparison, this script is working:

ngram=3
cat /Lexicon/whole_lexicon.lex | perl -ne 'chomp; @tok = split(/\s+/); $ntoprint = $#tok; if (3 < $ntoprint) {$ntoprint = 3}; for ($i = 1; $i <= $ntoprint; $i++) {printf("%s\t%s\n", join("", @tok[1..$i]), join(" ", @tok[1..$i])); }' > lexicons/lex$ngram.txt

I know now after some research that I could write a perl script an pass the variable value $ngram to this script, where I can use it with @ARGV. However, I'm looking for a solution so that I can just run a command in the terminal.

hyhno01
  • 177
  • 8

2 Answers2

3

Perl doesn't have access to the shell's variables, and the shell doesn't get to change anything in single quotes - there is no "invalid substitution" here because there is no substitution at all here. The solution is to pass the value to Perl as an argument, or (less ideally) have the shell inject the value into the Perl source e.g. by switching from single to double quotes around part of the Perl script.

for ngram in $(seq 3 1 9)
do
    perl -ne 'BEGIN { $ngram = shift @ARGV; }
        chomp;
        @tok = split(/\s+/);
        $ntoprint = $#tok;
        if ($ngram < $ntoprint) {$ntoprint = $ngram};
        for ($i = 1; $i <= $ntoprint; $i++) {
           printf("%s\t%s\n", join("", @tok[1..$i]), join(" ", @tok[1..$i]));
        }' "$ngram" < /Lexicon/whole_lexicon.lex > lexicons/"lex$ngram.txt"
done

This also removes the useless cat and fixes a minor quoting error.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • 2
    Passing the shell variable to perl through the environment is another approach, much less awful than the referenced injection. – Charles Duffy Nov 07 '19 at 17:21
2

In your original code, $ngram is a shell variable. But make it into an environment variable and Perl will be able to access it through the special hash %ENV.

export ngram       # upgrade $ngram from shell to environment variable
for ngram in $ngrams
do

    perl -ne 'chomp; @tok = split(/\s+/); $ntoprint = $#tok; 
          if ($ENV{ngram} < $ntoprint) {$ntoprint = $ENV{ngram}};
          for ($i = 1; $i <= $ntoprint; $i++) {
              printf("%s\t%s\n", join("", @tok[1..$i]), join(" ", @tok[1..$i]));
          }' < /Lexicon/whole_lexicon.lex > lexicons/lex$ngram.txt

done
mob
  • 117,087
  • 18
  • 149
  • 283