0

I am trying to apply LDA for topic modeling using the Mallet wrapper of Gensim on Python. The code that I am running is as follows:

MALLET_PATH = 'C:/mallet-2.0.8/bin/mallet'
lda_mallet = gensim.models.wrappers.LdaMallet(mallet_path=MALLET_PATH, corpus=bow_corpus, 
                                              num_topics=TOTAL_TOPICS, id2word=dictionary,
                                              iterations=500, workers=16)

Mallet is installed in C-drive and is running on the Command Prompt (C:\mallet-2.0.8\bin\mallet). The help command is also working (import-dir --). Java is also installed. The environment variable and the path have also been set for both Mallet and Java.Yet the output shows the following error.

CalledProcessError: Command 'mallet-2.0.8/bin/mallet import-file --preserve-case --keep-sequence --remove-stopwords --token-regex "\S+" --input C:\Users\imibh\AppData\Local\Temp\a8b7e6_corpus.txt --output C:\Users\imibh\AppData\Local\Temp\a8b7e6_corpus.mallet' returned non-zero exit status 1.

Have already tried all the responses to past such queries on stack overflow without any improvement.

Would greatly appreciate any help.

Manit

Soroosh Sorkhani
  • 66
  • 1
  • 3
  • 15

2 Answers2

0

If you are using windows you might need to do this

MALLET_PATH = "C:\\mallet-2.0.8\\bin\\mallet"
Anurag Wagh
  • 1,086
  • 6
  • 16
0

Make sure you installed the Java Developers Kit (JDK).

After installing the JDK, the following codes for the LDA Mallet worked like charm!

import os
from gensim.models.wrappers import LdaMallet

os.environ.update({'MALLET_HOME':r'C:/mallet/mallet-2.0.8/'})
mallet_path = r'C:/mallet/mallet-2.0.8/bin/mallet.bat'

lda_mallet = LdaMallet(
        mallet_path,
        corpus = corpus_bow,
        num_topics = n_topics,
        id2word = dct,
    )

The credit goes to this another answer