I tried to execute the following example on DL4J (loading pre-trained vectors file):
File gModel = new File("./GoogleNews-vectors-negative300.bin.gz");
Word2Vec vec = WordVectorSerializer.loadGoogleModel(gModel, true);
InputStreamReader r = new InputStreamReader(System.in);
BufferedReader br = new BufferedReader(r);
for (; ; ) {
System.out.print("Word: ");
String word = br.readLine();
if ("EXIT".equals(word)) break;
Collection<String> lst = vec.wordsNearest(word, 20);
System.out.println(word + " -> " + lst);
}
But it is super slow (taking ~10 minutes to calculate the nearest words, though they are correct).
There is enough memory (-Xms20g -Xmx20g
).
When I run the same Word2Vec example from https://code.google.com/p/word2vec/
it gives the nearest words very quickly.
DL4J uses ND4J which claims to be twice as fast as Numpy: http://nd4j.org/benchmarking
Is there anything wrong with my code?
UPDATE: It is based on https://github.com/deeplearning4j/dl4j-0.4-examples.git (I didn't touch any dependencies, just tried to read the Google pre-trained vectors file). Word2VecRawTextExample works just fine (but the data size is relatively small).