0

In the past few days, I have started using Mallet. I am specifically interested in running a hierarchical topic model, like HLDA or HPAM. When importing the sample data files and running them using the cc.mallet.topics.tui.HierarchicalLDATUI class, I get results, no problems.

When running the same on the Wikipedia article on WW2, after importing I get the following error:

$ bin/mallet run cc.mallet.topics.tui.HierarchicalLDATUI --input ww2.mallet
    Exception in thread "main" java.lang.NullPointerException 
    at cc.mallet.topics.HierarchicalLDA$NCRPNode.dropPath(HierarchicalLDA.java:637)
    at cc.mallet.topics.HierarchicalLDA.samplePath(HierarchicalLDA.java:164)
    at cc.mallet.topics.HierarchicalLDA.estimate(HierarchicalLDA.java:133)
    at cc.mallet.topics.tui.HierarchicalLDATUI.main(HierarchicalLDATUI.java:109)

I imported the data like this:

$ bin/mallet import-dir --input ww2Wiki --output ww2.mallet --keep-sequence TRUE --skip-html TRUE --remove-stopwords TRUE

To make your lives easier, here's the code at which the error occurs in HierarchicalLDA.java (lines 627-640)

public void dropPath() {
    NCRPNode node = this;
    node.customers--;
    if (node.customers == 0) {
        node.parent.remove(node);
    }
    for (int l = 1; l < numLevels; l++) {
        node = node.parent;
        node.customers--;
        if (node.customers == 0) {
            node.parent.remove(node); //line 637 (producing the error)
        }
    }
}

Seemingly, the error occurs when, during the NCRP implementation, it tries to remove a node, which is null. I do not know why this happens with certain files but not with others.

I checked if it might be a general problem related to the file running the same file on cc.mallet.topics.HierarchicalPAM and with that the file works and HPAM produces reasonable results. Other files work in the HLDA implementation, so I do not think it is the code itself.

At this point I am clueless what to do. Did anyone encounter and solve this problem before?

Thanks!

PS: I feel like I have to point this out for the Java community. This is not my code, it is an open source software, which I compiled on my computer. I am missing both time and overview to read through the whole code to track down the error.

MrDeal
  • 373
  • 2
  • 11
  • Possible duplicate of [What is a NullPointerException, and how do I fix it?](https://stackoverflow.com/questions/218384/what-is-a-nullpointerexception-and-how-do-i-fix-it) – Zoe Jan 16 '18 at 19:12

1 Answers1

0

It took a while but I found the answer to the problem and it seems too simple.

HLDATUI considers files as documents, which means if there is only one file there are not enough documents and the program crashes. That means one has to import more than one file.

The solution to my personal situation is that I will write a program, which will split the .xml file I want to run HLDATUI on into multiple smaller files, which then can be imported and analyzed.

MrDeal
  • 373
  • 2
  • 11