DL4J is super slow on GoogleNews-vectors file

Question

I tried to execute the following example on DL4J (loading pre-trained vectors file):

File gModel = new File("./GoogleNews-vectors-negative300.bin.gz");

Word2Vec vec = WordVectorSerializer.loadGoogleModel(gModel, true);

InputStreamReader r = new InputStreamReader(System.in);

BufferedReader br = new BufferedReader(r);

for (; ; ) {
    System.out.print("Word: ");
    String word = br.readLine();

    if ("EXIT".equals(word)) break;

    Collection<String> lst = vec.wordsNearest(word, 20);

    System.out.println(word + " -> " + lst);
}

But it is super slow (taking ~10 minutes to calculate the nearest words, though they are correct).

There is enough memory (-Xms20g -Xmx20g).

When I run the same Word2Vec example from https://code.google.com/p/word2vec/

it gives the nearest words very quickly.

DL4J uses ND4J which claims to be twice as fast as Numpy: http://nd4j.org/benchmarking

Is there anything wrong with my code?

UPDATE: It is based on https://github.com/deeplearning4j/dl4j-0.4-examples.git (I didn't touch any dependencies, just tried to read the Google pre-trained vectors file). Word2VecRawTextExample works just fine (but the data size is relatively small).

score 0 · Answer 1 · answered Nov 21 '17 at 12:01

In order to improve performance, I propose you to do the following:

Set environment variable OMP_NUM_THREADS equal to number of your logical cores
Install Intel Math Kernel Library if you use Intel processors
In your path add information where mkl_intel_thread.dll from Intel Math Kernel library lives

score 0 · Answer 2 · answered Nov 22 '19 at 03:42

This post is real old , but by now it should have improved a lot. I have run the DL4J with Word2vec model in production with following settings @ JVM level and it works on a t2.large box onwards with 8G RAM and up

java -Xmx2G -Dorg.bytedeco.javacpp.maxbytes=6G -Dorg.bytedeco.javacpp.maxphysicalbytes=6G

Also I have not used wordsNearest() method because it comes with restrictions of having corpus embedding to be pre-computed , instead of written my own cosine similarity which performs sub milliseconds response.

Blog post for that one is here

https://medium.com/sumvit/building-text-similarity-system-from-ground-up-using-word2vec-and-deeplearning4j-dece9ae4e433

in case if you want to know how to build nearest word or any other application like text similarity (same basic principal)

DL4J is super slow on GoogleNews-vectors file

2 Answers2