6

I'm getting an error while trying to access gensims mallet in jupyter notebooks. I have the specified file 'mallet' in the same folder as my notebook, but cant seem to access it. I tried routing to it from the C drive but I still get the same error. Please help :)

import os
from gensim.models.wrappers import LdaMallet

#os.environ.update({'MALLET_HOME':r'C:/Users/new_mallet/mallet-2.0.8/'})

mallet_path = 'mallet' # update this path

ldamallet = gensim.models.wrappers.LdaMallet(mallet_path, corpus=bow_corpus, num_topics=20, id2word=dictionary)

result = (ldamallet.show_topics(num_topics=3, num_words=10,formatted=False))
for each in result:
    print (each)

Mallet Error CalledProcessError

enter image description here

Sara
  • 1,162
  • 1
  • 8
  • 21
  • Was there any other error output, before the Python stack trace? What if, immediately after getting the error, you try the shown command-line (`mallet import-file ..."`) yourself – is any more info shown? – gojomo Mar 22 '19 at 01:05
  • @gojomo thank you for reaching out I appreciate it. I tried the command you listed and I'm still getting the same error :( – Sara Mar 22 '19 at 16:03
  • @gojomo In command prompt that command returns 'mallet' is not recognized as an internal or external command, operable program or batch file. – Sara Mar 22 '19 at 16:12
  • That suggests that the necessary `mallet` executable either isn't installed, or can't be found from where the Python interpreter, and you when you manually re-try, are executing. Are you sure it's installed? Can you fix your `mallet_path` variable to actually be a valid path to the `mallet` executable? – gojomo Mar 22 '19 at 18:07
  • @gojomo As far as I can tell mallet is installed though I can't seem to find an executable. I've even written up path environment variables and tripled checked my path. – Sara Apr 01 '19 at 16:14

9 Answers9

4

Update the path to:

mallet_path = 'C:/mallet/mallet-2.0.8/bin/mallet.bat'

and edit the notepad mallet.bat within the mallet 2.0.8 folder to:

@echo off

rem This batch file serves as a wrapper for several
rem  MALLET command line tools.

if not "%MALLET_HOME%" == "" goto gotMalletHome

echo MALLET requires an environment variable MALLET_HOME.
goto :eof

:gotMalletHome

set MALLET_CLASSPATH=C:\mallet\mallet-2.0.8\class;C:\mallet\mallet-2.0.8\lib\mallet-deps.jar
set MALLET_MEMORY=1G
set MALLET_ENCODING=UTF-8

set CMD=%1
shift

set CLASS=
if "%CMD%"=="import-dir" set CLASS=cc.mallet.classify.tui.Text2Vectors
if "%CMD%"=="import-file" set CLASS=cc.mallet.classify.tui.Csv2Vectors
if "%CMD%"=="import-svmlight" set CLASS=cc.mallet.classify.tui.SvmLight2Vectors
if "%CMD%"=="info" set CLASS=cc.mallet.classify.tui.Vectors2Info
if "%CMD%"=="train-classifier" set CLASS=cc.mallet.classify.tui.Vectors2Classify
if "%CMD%"=="classify-dir" set CLASS=cc.mallet.classify.tui.Text2Classify
if "%CMD%"=="classify-file" set CLASS=cc.mallet.classify.tui.Csv2Classify
if "%CMD%"=="classify-svmlight" set CLASS=cc.mallet.classify.tui.SvmLight2Classify
if "%CMD%"=="train-topics" set CLASS=cc.mallet.topics.tui.TopicTrainer
if "%CMD%"=="infer-topics" set CLASS=cc.mallet.topics.tui.InferTopics
if "%CMD%"=="evaluate-topics" set CLASS=cc.mallet.topics.tui.EvaluateTopics
if "%CMD%"=="prune" set CLASS=cc.mallet.classify.tui.Vectors2Vectors
if "%CMD%"=="split" set CLASS=cc.mallet.classify.tui.Vectors2Vectors
if "%CMD%"=="bulk-load" set CLASS=cc.mallet.util.BulkLoader
if "%CMD%"=="run" set CLASS=%1 & shift

if not "%CLASS%" == "" goto gotClass

echo Mallet 2.0 commands: 
echo   import-dir        load the contents of a directory into mallet instances (one per file)
echo   import-file       load a single file into mallet instances (one per line)
echo   import-svmlight   load a single SVMLight format data file into mallet instances (one per line)
echo   info              get information about Mallet instances
echo   train-classifier  train a classifier from Mallet data files
echo   classify-dir      classify data from a single file with a saved classifier
echo   classify-file     classify the contents of a directory with a saved classifier
echo   classify-svmlight classify data from a single file in SVMLight format
echo   train-topics      train a topic model from Mallet data files
echo   infer-topics      use a trained topic model to infer topics for new documents
echo   evaluate-topics   estimate the probability of new documents given a trained model
echo   prune             remove features based on frequency or information gain
echo   split             divide data into testing, training, and validation portions
echo   bulk-load         for big input files, efficiently prune vocabulary and import docs
echo Include --help with any option for more information


goto :eof

:gotClass

set MALLET_ARGS=

:getArg

if "%1"=="" goto run
set MALLET_ARGS=%MALLET_ARGS% %1
shift
goto getArg

:run

"C:\Program Files\Java\jdk-12\bin\java" -ea -Dfile.encoding=%MALLET_ENCODING% -classpath %MALLET_CLASSPATH% %CLASS% %MALLET_ARGS%

:eof

in command line these were helpful commands to figure out what was going on:

notepad mallet.bat
java
C:\Program Files\Java\jdk-12\bin\java
dir /OD
cd %userdir%
cd %userpath%
cd\
cd users
cd your_username
cd appdata\local\temp\2
dir /OD

the problem is with java not being installed correctly or with the path not including java and the mallet classpath not being defined correctly. More info here: https://docs.oracle.com/javase/7/docs/technotes/tools/windows/classpath.html . This solved my error hopefully it helps someone else :)

Drdilyor
  • 1,250
  • 1
  • 12
  • 30
Sara
  • 1,162
  • 1
  • 8
  • 21
3

Make sure you installed the Java Developers Kit (JDK).

The credit goes to this another answer

After installing the JDK, the following codes for the LDA Mallet worked like charm!

import os
from gensim.models.wrappers import LdaMallet

os.environ.update({'MALLET_HOME':r'C:/mallet/mallet-2.0.8/'})
mallet_path = r'C:/mallet/mallet-2.0.8/bin/mallet.bat'

lda_mallet = LdaMallet(
        mallet_path,
        corpus = corpus_bow,
        num_topics = n_topics,
        id2word = dct,
    )
  • after installing JDK, did you add it to path? I also have jre and i've added that to JAVA_HOME so not sure if I should keep that or use the JDK path... – Mahlatse Sep 08 '21 at 14:23
1

I got the same problem. What I did was change the location of mallet folder to the c://new_mallet so it worked nicely

    import os
    os.environ.update({'MALLET_HOME': r'C:/new_mallet/mallet-2.0.8/'})
    mallet_path = 'C:/new_mallet/mallet-2.0.8/bin/mallet'  # update this path
    ldamallet = gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus, num_topics=10, id2word=id2word)
Gihan Gamage
  • 2,944
  • 19
  • 27
1

In Jupyter Notebook with Python, I run a

conda uninstall gensim
conda install gensim

in cmd as an administrator and restarted my kernel. Worked like charm after i spent horrendous hours online searching.

  • do you recall the versions you switched from and to? I can't seem to get it right with this solution. my gensim version is 3.8.3 as 4.0.1 doesnt have wrapper – Mahlatse Sep 08 '21 at 13:57
0

For me, this was not an import or a path problem.

I spent hours trying to solve it. Tried this solution and nothing worked.

Looking to a previous sucessfull call I made to LDA Mallet, I noticed some parameters were not being set, then I made it like this:

gensim.models.wrappers.LdaMallet(mallet_path=mallet_path, corpus=corpus, num_topics=num_topics, id2word=id2word, prefix='temp_file_', workers=4)

I really hope it helps you. Finding a solution to this problem was a pain.

0

For linux, I found that one needs to explicitly define the binary mallet path. The following code works.

from gensim.test.utils import common_corpus, common_dictionary
from gensim.models.wrappers import LdaMallet

mallet_path = "/path/Mallet/bin/mallet"
model = LdaMallet(mallet_path=mallet_path, corpus=common_corpus, num_topics=2, id2word=common_dictionary)
Idealist
  • 792
  • 1
  • 8
  • 18
0

For anyone else who is still struggling and spent hours trying many different suggestions, I finally got it working!

follow the instructions here (I was on mac)

https://ps.au.dk/fileadmin/ingen_mappe_valgt/installing_mallet.pdf

I also closed anaconda before I started this, don't know if that's important.

In the terminal I got the following error:

(base) myname-MacBook-Air:mallet-2.0.8 myname$ ./bin/mallet
-bash: ./bin/mallet: /bin/bash: bad interpreter: Operation not permitted

then I followed these instructions to un-quarantine

“bad interpreter: Operation not permitted” Error on El Capitan

reopened anaconda and it all worked!

Nikunj Kakadiya
  • 2,689
  • 2
  • 20
  • 35
  • 1
    Welcome to Stack Overflow! Here, no one should have to follow links to get the answer that they need. Put all of the steps and code **in your answer**. – Lakshya Raj Feb 10 '21 at 22:54
0

i fixed the issue by downloading JDK java https://docs.oracle.com/en/java/javase/15/install/installation-jdk-macos.html#GUID-F9183C70-2E96-40F4-9104-F3814A5A331F

0

I had the same error because I had forgotten to install java on my ubuntu.

Axisnix
  • 2,822
  • 5
  • 19
  • 41