I tried to follow this.
But some how I wasted a lot of time ending up with nothing useful.
I just want to train a GloVe
model on my own corpus (~900Mb corpus.txt file).
I downloaded the files provided in the link above and compiled it using cygwin
(after editing the demo.sh file and changed it to VOCAB_FILE=corpus.txt
. should I leave CORPUS=text8
unchanged?)
the output was:
- cooccurrence.bin
- cooccurrence.shuf.bin
- text8
- corpus.txt
- vectors.txt
How can I used those files to load it as a GloVe
model on python?