I ran the following Awk script to get fastText vectors on my Ubuntu 22.04.2 LTS (Jammy Jellyfish). However, I always get the same error code: awk: lines 5 and 13: unexpected character 0xe2
The Awk script that combines a .txt wordlist into a file with vectors:
$ awk -f combine.awk
BEGIN{
infile = "adjectives.txt"
while (getline < infile > 0) {
INCLUDE[$1]=1
}
close(infile)
infile = "cc.en.300.vec"
outfile = "fasttextvectors_adjectives.txt"
system("rm " outfile)
while (getline < infile > 0) {
if ($1 in INCLUDE) print >> outfile
}
close(infile)
close(outfile)
}
**I suspect there is something in the Awk script code itself, but I have seen someone use the same script in their Mac and being able to run it. Is it something about Ubuntu?
I've already tried:**
- Making sure word list doesn't contain words with special characters at all;
- Changing the .txt list UTF-8 encoding for Mac, Linux, Windows;
- Making sure the file names also do not contain special characters.
Still, I always get the same error:
awk: lines 5 and 13: unexpected character 0xe2
There are no special characters in the word list itself
These are the lines 5 and 13 in the awk script (maybe the special character is '$'?):
INCLUDE[$1]=1
if ($1 in INCLUDE) print >> outfile
Any help would greatly appreciated. Also, I am student and just a beginner with Word Embeddings and Vectors.
Thank Youuu!