1

I am trying to build a topic model with 10000 topics on a dataset of 1M samples. After data is loaded I am getting this line:

bin/mallet: line 62: 17428 Killed $JAVA_COMMAND $CLASS $*.

This is the command I am running:

`bin/mallet train-topics \
  --input data.mallet \
  --output-model topics.model \
  --output-topic-keys topic-keys.txt \ 
  --topic-word-weights-file topic-word-weights.txt \
  --word-topic-counts-file word-topic-counts-file.txt \
  --output-doc-topics doc-topics.txt \
  --num-topics 10000 \
  --num-threads 28 \
  --num-iterations 2000 \
  --use-symmetric-alpha FALSE`

Any suggestion is appreciated.

ak.
  • 143
  • 9
  • 1
    This looks like a memory problem, you might check [this question about memory allocation](http://stackoverflow.com/questions/726690/who-killed-my-process-and-why). – David Mimno Dec 19 '16 at 15:03
  • @DavidMimno Thank you. I had my suspicions on that. I am using 200G of memory and it works for 1000 topics but when I increase to 5K or 10K it kills the process. Does the number of threads affect the memory allocation requirements? – ak. Dec 19 '16 at 15:11
  • 1
    Yes, each thread builds its own copy of the type-topic counts. Threads may actually have more effect than number of topics. – David Mimno Dec 19 '16 at 15:17
  • @DavidMimno Thank you very much. This is a very helpful advice. I will try building a model with lower number of threads and see if I can keep it within the memory boundaries. – ak. Dec 19 '16 at 15:34

0 Answers0