I'm creating several lexicons for a word recognition program containing only the x first sounds of a word (henceforth ngram). Therefore, I extract the needed words from an existing lexicon. However, I would like to do this automatically, i.e. find all words of a ngram (e.g. ngram = 3), save them, increase ngram (= 4) and repeat the process. The code looks like this:
ngrams=$(seq 3 1 9)
for ngram in $ngrams
do
cat /Lexicon/whole_lexicon.lex | perl -ne 'chomp; @tok = split(/\s+/); $ntoprint = $#tok; if ($ngram < $ntoprint) {$ntoprint = $ngram}; for ($i = 1; $i <= $ntoprint; $i++) {printf("%s\t%s\n", join("", @tok[1..$i]), join(" ", @tok[1..$i])); }' > lexicons/lex$ngram.txt
done
Unfortunately, the value $ngram
is not recognised by perl and the command is not working properly. For comparison, this script is working:
ngram=3
cat /Lexicon/whole_lexicon.lex | perl -ne 'chomp; @tok = split(/\s+/); $ntoprint = $#tok; if (3 < $ntoprint) {$ntoprint = 3}; for ($i = 1; $i <= $ntoprint; $i++) {printf("%s\t%s\n", join("", @tok[1..$i]), join(" ", @tok[1..$i])); }' > lexicons/lex$ngram.txt
I know now after some research that I could write a perl script an pass the variable value $ngram
to this script, where I can use it with @ARGV
. However, I'm looking for a solution so that I can just run a command in the terminal.