3

I am trying to use the open-nlp Ruby gem to access the Java OpenNLP processor through RJB (Ruby Java Bridge). I am not a Java programmer, so I don't know how to solve this. Any recommendations regarding resolving it, debugging it, collecting more information, etc. would be appreciated.

The environment is Windows 8, Ruby 1.9.3p448, Rails 4.0.0, JDK 1.7.0-40 x586. Gems are rjb 1.4.8 and louismullie/open-nlp 0.1.4. For the record, this file runs in JRuby but I experience other problems in that environment and would prefer to stay native Ruby for now.

In brief, the open-nlp gem is failing with java.lang.NullPointerException and Ruby error method missing. I hesitate to say why this is happening because I don't know, but it appears to me that the dynamic loading of the Jars file opennlp.tools.postag.POSTaggerME@1b5080a cannot be accessed, perhaps because OpenNLP::Bindings::Utils.tagWithArrayList isn't being set up correctly. OpenNLP::Bindings is Ruby. Utils, and its methods, are Java. And Utils is supposedly the "default" Jars and Class files, which may be important.

What am I doing wrong, here? Thanks!

The code I am running is copied straight out of github/open-nlp. My copy of the code is:

class OpennlpTryer

  $DEBUG=false

  # From https://github.com/louismullie/open-nlp
  # Hints: Dir.pwd; File.expand_path('../../Gemfile', __FILE__);
  # Load the module
  require 'open-nlp'
  #require 'jruby-jars'

=begin
  # Alias "write" to "print" to monkeypatch the NoMethod write error
  java_import java.io.PrintStream
  class PrintStream
    java_alias(:write, :print, [java.lang.String])
  end
=end

=begin
  # Display path of jruby-jars jars...
  puts JRubyJars.core_jar_path # => path to jruby-core-VERSION.jar
  puts JRubyJars.stdlib_jar_path # => path to jruby-stdlib-VERSION.jar
=end
  puts ENV['CLASSPATH']

  # Set an alternative path to look for the JAR files.
  # Default is gem's bin folder.
  # OpenNLP.jar_path = '/path_to_jars/'

  OpenNLP.jar_path = File.join(ENV["GEM_HOME"],"gems/open-nlp-0.1.4/bin/")
  puts OpenNLP.jar_path
  # Set an alternative path to look for the model files.
  # Default is gem's bin folder.
  # OpenNLP.model_path = '/path_to_models/'

  OpenNLP.model_path = File.join(ENV["GEM_HOME"],"gems/open-nlp-0.1.4/bin/")
  puts OpenNLP.model_path
  # Pass some alternative arguments to the Java VM.
  # Default is ['-Xms512M', '-Xmx1024M'].
  # OpenNLP.jvm_args = ['-option1', '-option2']
  OpenNLP.jvm_args = ['-Xms512M', '-Xmx1024M']
  # Redirect VM output to log.txt
  OpenNLP.log_file = 'log.txt'
  # Set default models for a language.
  # OpenNLP.use :language
  OpenNLP.use :english          # Make sure this is lower case!!!!

# Simple tokenizer

  OpenNLP.load

  sent = "The death of the poet was kept from his poems."
  tokenizer = OpenNLP::SimpleTokenizer.new

  tokens = tokenizer.tokenize(sent).to_a
# => %w[The death of the poet was kept from his poems .]
  puts "Tokenize #{tokens}"

# Maximum entropy tokenizer, chunker and POS tagger

  OpenNLP.load

  chunker = OpenNLP::ChunkerME.new
  tokenizer = OpenNLP::TokenizerME.new
  tagger = OpenNLP::POSTaggerME.new

  sent = "The death of the poet was kept from his poems."

  tokens = tokenizer.tokenize(sent).to_a
# => %w[The death of the poet was kept from his poems .]
  puts "Tokenize #{tokens}"

  tags = tagger.tag(tokens).to_a
# => %w[DT NN IN DT NN VBD VBN IN PRP$ NNS .]
  puts "Tags #{tags}"

  chunks = chunker.chunk(tokens, tags).to_a
# => %w[B-NP I-NP B-PP B-NP I-NP B-VP I-VP B-PP B-NP I-NP O]
  puts "Chunks #{chunks}"


# Abstract Bottom-Up Parser

  OpenNLP.load

  sent = "The death of the poet was kept from his poems."
  parser = OpenNLP::Parser.new
  parse = parser.parse(sent)

=begin
  parse.get_text.should eql sent

  parse.get_span.get_start.should eql 0
  parse.get_span.get_end.should eql 46
  parse.get_child_count.should eql 1
=end

  child = parse.get_children[0]

  child.text # => "The death of the poet was kept from his poems."
  child.get_child_count # => 3
  child.get_head_index #=> 5
  child.get_type # => "S"

  puts "Child: #{child}"

# Maximum Entropy Name Finder*

  OpenNLP.load

  # puts File.expand_path('.', __FILE__)
  text = File.read('./spec/sample.txt').gsub!("\n", "")

  tokenizer = OpenNLP::TokenizerME.new
  segmenter = OpenNLP::SentenceDetectorME.new
  puts "Tokenizer: #{tokenizer}"
  puts "Segmenter: #{segmenter}"

  ner_models = ['person', 'time', 'money']
  ner_finders = ner_models.map do |model|
    OpenNLP::NameFinderME.new("en-ner-#{model}.bin")
  end
  puts "NER Finders: #{ner_finders}"

  sentences = segmenter.sent_detect(text)
  puts "Sentences: #{sentences}"

  named_entities = []

  sentences.each do |sentence|
    tokens = tokenizer.tokenize(sentence)
    ner_models.each_with_index do |model, i|
      finder = ner_finders[i]
      name_spans = finder.find(tokens)
      name_spans.each do |name_span|
        start = name_span.get_start
        stop = name_span.get_end-1
        slice = tokens[start..stop].to_a
        named_entities << [slice, model]
      end
    end
  end
  puts "Named Entities: #{named_entities}"

# Loading specific models
# Just pass the name of the model file to the constructor. The gem will search for the file in the OpenNLP.model_path folder.

  OpenNLP.load

  tokenizer = OpenNLP::TokenizerME.new('en-token.bin')
  tagger = OpenNLP::POSTaggerME.new('en-pos-perceptron.bin')
  name_finder = OpenNLP::NameFinderME.new('en-ner-person.bin')
# etc.
  puts "Tokenizer: #{tokenizer}"
  puts "Tagger: #{tagger}"
  puts "Name Finder: #{name_finder}"

# Loading specific classes
# You may want to load specific classes from the OpenNLP library that are not loaded by default. The gem provides an API to do this:

# Default base class is opennlp.tools.
  OpenNLP.load_class('SomeClassName')
# => OpenNLP::SomeClassName

# Here, we specify another base class.
  OpenNLP.load_class('SomeOtherClass', 'opennlp.tools.namefind')
  # => OpenNLP::SomeOtherClass

end

The line which is failing is line 73: (tokens == the sentence being processed.)

  tags = tagger.tag(tokens).to_a  # 
# => %w[DT NN IN DT NN VBD VBN IN PRP$ NNS .]

tagger.tag calls open-nlp/classes.rb line 13, which is where the error is thrown. The code there is:

class OpenNLP::POSTaggerME < OpenNLP::Base

  unless RUBY_PLATFORM =~ /java/
    def tag(*args)
      OpenNLP::Bindings::Utils.tagWithArrayList(@proxy_inst, args[0])  # <== Line 13
    end
  end

end

The Ruby error thrown at this point is: `method_missing': unknown exception (NullPointerException). Debugging this, I found the error java.lang.NullPointerException. args[0] is the sentence being processed. @proxy_inst is opennlp.tools.postag.POSTaggerME@1b5080a.

OpenNLP::Bindings sets up the Java environment. For example, it sets up the Jars to be loaded and the classes within those Jars. In line 54, it sets up defaults for RJB, which should set up OpenNLP::Bindings::Utils and its methods as follows:

  # Add in Rjb workarounds.
  unless RUBY_PLATFORM =~ /java/
    self.default_jars << 'utils.jar'
    self.default_classes << ['Utils', '']
  end

utils.jar and Utils.java are in the CLASSPATH with the other Jars being loaded. They are being accessed, which is verified because the other Jars throw error messages if they are not present. The CLASSPATH is:

.;C:\Program Files (x86)Java\jdk1.7.0_40\lib;C:\Program Files (x86)Java\jre7\lib;D:\BitNami\rubystack-1.9.3-12\ruby\lib\ruby\gems\1.9.1\gems\open-nlp-0.1.4\bin

The applications Jars are in D:\BitNami\rubystack-1.9.3-12\ruby\lib\ruby\gems\1.9.1\gems\open-nlp-0.1.4\bin and, again, if they are not there I get error messages on other Jars. The Jars and Java files in ...\bin include:

jwnl-1.3.3.jar
opennlp-maxent-3.0.2-incubating.jar
opennlp-tools-1.5.2-incubating.jar
opennlp-uima-1.5.2-incubating.jar
utils.jar
Utils.java

Utils.java is as follows:

import java.util.Arrays;
import java.util.ArrayList;
import java.lang.String;
import opennlp.tools.postag.POSTagger;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.namefind.NameFinderME; // interface instead?
import opennlp.tools.util.Span;

// javac -cp '.:opennlp.tools.jar' Utils.java
// jar cf utils.jar Utils.class
public class Utils {

    public static String[] tagWithArrayList(POSTagger posTagger, ArrayList[] objectArray) {
      return posTagger.tag(getStringArray(objectArray));
    }
    public static Object[] findWithArrayList(NameFinderME nameFinder, ArrayList[] tokens) {
      return nameFinder.find(getStringArray(tokens));
    }
    public static Object[] chunkWithArrays(ChunkerME chunker, ArrayList[] tokens, ArrayList[] tags) {
      return chunker.chunk(getStringArray(tokens), getStringArray(tags));
    }
    public static String[] getStringArray(ArrayList[] objectArray) {
      String[] stringArray = Arrays.copyOf(objectArray, objectArray.length, String[].class);
          return stringArray;
    }
}

So, it should define tagWithArrayList and import opennlp.tools.postag.POSTagger. (OBTW, just to try, I changed the incidences of POSTagger to POSTaggerME in this file. It changed nothing...)

The tools Jar file, opennlp-tools-1.5.2-incubating.jar, includes postag/POSTagger and POSTaggerME class files, as expected.

Error messages are:

D:\BitNami\rubystack-1.9.3-12\ruby\bin\ruby.exe -e $stdout.sync=true;$stderr.sync=true;load($0=ARGV.shift) D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb
.;C:\Program Files (x86)\Java\jdk1.7.0_40\lib;C:\Program Files (x86)\Java\jre7\lib;D:\BitNami\rubystack-1.9.3-12\ruby\lib\ruby\gems\1.9.1\gems\open-nlp-0.1.4\bin
D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/bin/
D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/bin/
Tokenize ["The", "death", "of", "the", "poet", "was", "kept", "from", "his", "poems", "."]
Tokenize ["The", "death", "of", "the", "poet", "was", "kept", "from", "his", "poems", "."]
D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:13:in `method_missing': unknown exception (NullPointerException)
    from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:13:in `tag'
    from D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:73:in `<class:OpennlpTryer>'
    from D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in `<top (required)>'
    from -e:1:in `load'
    from -e:1:in `<main>'

Modified Utils.java:

import java.util.Arrays;
import java.util.Object;
import java.lang.String;
import opennlp.tools.postag.POSTagger;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.namefind.NameFinderME; // interface instead?
import opennlp.tools.util.Span;

// javac -cp '.:opennlp.tools.jar' Utils.java
// jar cf utils.jar Utils.class
public class Utils {

    public static String[] tagWithArrayList(POSTagger posTagger, Object[] objectArray) {
      return posTagger.tag(getStringArray(objectArray));
    }f
    public static Object[] findWithArrayList(NameFinderME nameFinder, Object[] tokens) {
      return nameFinder.find(getStringArray(tokens));
    }
    public static Object[] chunkWithArrays(ChunkerME chunker, Object[] tokens, Object[] tags) {
      return chunker.chunk(getStringArray(tokens), getStringArray(tags));
    }
    public static String[] getStringArray(Object[] objectArray) {
      String[] stringArray = Arrays.copyOf(objectArray, objectArray.length, String[].class);
          return stringArray;
    }
}

Modified error messages:

Uncaught exception: uninitialized constant OpennlpTryer::ArrayStoreException
    D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:81:in `rescue in <class:OpennlpTryer>'
    D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:77:in `<class:OpennlpTryer>'
    D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in `<top (required)>'

Revised error with Utils.java revised to "import java.lang.Object;":

Uncaught exception: uninitialized constant OpennlpTryer::ArrayStoreException
    D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:81:in `rescue in <class:OpennlpTryer>'
    D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:77:in `<class:OpennlpTryer>'
    D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in `<top (required)>'

Rescue removed from OpennlpTryer shows error trapped in classes.rb:

Uncaught exception: uninitialized constant OpenNLP::POSTaggerME::ArrayStoreException
    D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:16:in `rescue in tag'
    D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:13:in `tag'
    D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:78:in `<class:OpennlpTryer>'
    D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in `<top (required)>'

Same error but with all rescues removed so it's "native Ruby"

Uncaught exception: unknown exception
    D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:15:in `method_missing'
    D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:15:in `tag'
    D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:78:in `<class:OpennlpTryer>'
    D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in `<top (required)>'

Revised Utils.java:

import java.util.Arrays;
import java.util.ArrayList;
import java.lang.String;
import opennlp.tools.postag.POSTagger;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.namefind.NameFinderME; // interface instead?
import opennlp.tools.util.Span;

// javac -cp '.:opennlp.tools.jar' Utils.java
// jar cf utils.jar Utils.class
public class Utils {

    public static String[] tagWithArrayList(
      System.out.println("Tokens: ("+objectArray.getClass().getSimpleName()+"): \n"+objectArray);
      POSTagger posTagger, ArrayList[] objectArray) {
      return posTagger.tag(getStringArray(objectArray));
    }
    public static Object[] findWithArrayList(NameFinderME nameFinder, ArrayList[] tokens) {
      return nameFinder.find(getStringArray(tokens));
    }
    public static Object[] chunkWithArrays(ChunkerME chunker, ArrayList[] tokens, ArrayList[] tags) {
      return chunker.chunk(getStringArray(tokens), getStringArray(tags));
    }
    public static String[] getStringArray(ArrayList[] objectArray) {
      String[] stringArray = Arrays.copyOf(objectArray, objectArray.length, String[].class);
          return stringArray;
    }
}

I ran cavaj on Utils.class that I unzipped from util.jar and this is what I found. It differs from Utils.java by quite a bit. Both come installed with the open-nlp 1.4.8 gem. I don't know if this is the root cause of the problem, but this file is the core of where it breaks and we have a major discrepancy. Which should we use?

import java.util.ArrayList;
import java.util.Arrays;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.namefind.NameFinderME;
import opennlp.tools.postag.POSTagger;

public class Utils
{

    public Utils()
    {
    }

    public static String[] tagWithArrayList(POSTagger postagger, ArrayList aarraylist[])
    {
        return postagger.tag(getStringArray(aarraylist));
    }

    public static Object[] findWithArrayList(NameFinderME namefinderme, ArrayList aarraylist[])
    {
        return namefinderme.find(getStringArray(aarraylist));
    }

    public static Object[] chunkWithArrays(ChunkerME chunkerme, ArrayList aarraylist[], ArrayList aarraylist1[])
    {
        return chunkerme.chunk(getStringArray(aarraylist), getStringArray(aarraylist1));
    }

    public static String[] getStringArray(ArrayList aarraylist[])
    {
        String as[] = (String[])Arrays.copyOf(aarraylist, aarraylist.length, [Ljava/lang/String;);
        return as;
    }
}

Utils.java in use as of 10/07, compiled and compressed into utils.jar:

import java.util.Arrays;
import java.util.ArrayList;
import java.lang.String;
import opennlp.tools.postag.POSTagger;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.namefind.NameFinderME; // interface instead?
import opennlp.tools.util.Span;

// javac -cp '.:opennlp.tools.jar' Utils.java
// jar cf utils.jar Utils.class
public class Utils {

    public static String[] tagWithArrayList(POSTagger posTagger, ArrayList[] objectArray) {
      return posTagger.tag(getStringArray(objectArray));
    }
    public static Object[] findWithArrayList(NameFinderME nameFinder, ArrayList[] tokens) {
      return nameFinder.find(getStringArray(tokens));
    }
    public static Object[] chunkWithArrays(ChunkerME chunker, ArrayList[] tokens, ArrayList[] tags) {
      return chunker.chunk(getStringArray(tokens), getStringArray(tags));
    }
    public static String[] getStringArray(ArrayList[] objectArray) {
      String[] stringArray = Arrays.copyOf(objectArray, objectArray.length, String[].class);
          return stringArray;
    }
}

Failures are occurring in BindIt::Binding::load_klass in line 110 here:

# Private function to load classes.
# Doesn't check if initialized.
def load_klass(klass, base, name=nil)
  base += '.' unless base == ''
  fqcn = "#{base}#{klass}"
  name ||= klass
  if RUBY_PLATFORM =~ /java/
    rb_class = java_import(fqcn)
    if name != klass
      if rb_class.is_a?(Array)
        rb_class = rb_class.first
      end
      const_set(name.intern, rb_class)
    end
  else
    rb_class = Rjb::import(fqcn)             # <== This is line 110
    const_set(name.intern, rb_class)
  end
end

The messages are as follows, however they are inconsistent in terms of the particular method that is identified. Each run may display a different method, any of POSTagger, ChunkerME, or NameFinderME.

D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:110:in `import': opennlp/tools/namefind/NameFinderME (NoClassDefFoundError)
    from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:110:in `load_klass'
    from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:89:in `block in load_default_classes'
    from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:87:in `each'
    from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:87:in `load_default_classes'
    from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:56:in `bind'
    from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp.rb:14:in `load'
    from D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:54:in `<class:OpennlpTryer>'
    from D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in `<top (required)>'
    from -e:1:in `load'
    from -e:1:in `<main>'

The interesting point about these errors are that they are originating in OpennlpTryer line 54 which is:

  OpenNLP.load

At this point, OpenNLP fires up RJB which uses BindIt to load the jars and classes. This is well before the errors that I was seeing at the beginning of this question. However, I can't help but think it is all related. I really don't understand the inconsistency of these errors at all.

I was able to add the logging function in to Utils.java, compile it after adding in an "import java.io.*" and compress it. However, I pulled it out because of these errors as I didn't know if or not it was involved. I don't think it was. However, because these errors are occurring during load, the method is never called anyway so logging there won't help...

For each of the other jars, the jar is loaded then each class is imported using RJB. Utils is handled differently and is specified as the "default". From what I can tell, Utils.class is executed to load its own classes?

Later update on 10/07:

Here is where I am, I think. First, I have some problem replacing Utils.java, as I described earlier today. That problem probably needs solved before I can install a fix.

Second, I now understand the difference between POSTagger and POSTaggerME because the ME means Maximum Entropy. The test code is trying to call POSTaggerME but it looks to me like Utils.java, as implemented, supports POSTagger. I tried changing the test code to call POSTagger, but it said it couldn't find an initializer. Looking at the source for each of these, and I am guessing here, I think that POSTagger exists for the sole purpose to support POSTaggerME which implements it.

The source is opennlp-tools file opennlp-tools-1.5.2-incubating-sources.jar.

What I don't get is the whole reason for Utils in the first place? Why aren't the jars/classes provided in bindings.rb enough? This feels like a bad monkeypatch. I mean, look what bindings.rb does in the first place:

  # Default JARs to load.
  self.default_jars = [
    'jwnl-1.3.3.jar',
    'opennlp-tools-1.5.2-incubating.jar',
    'opennlp-maxent-3.0.2-incubating.jar',
    'opennlp-uima-1.5.2-incubating.jar'
  ]

  # Default namespace.
  self.default_namespace = 'opennlp.tools'

  # Default classes.
  self.default_classes = [
    # OpenNLP classes.
    ['AbstractBottomUpParser', 'opennlp.tools.parser'],
    ['DocumentCategorizerME', 'opennlp.tools.doccat'],
    ['ChunkerME', 'opennlp.tools.chunker'],
    ['DictionaryDetokenizer', 'opennlp.tools.tokenize'],
    ['NameFinderME', 'opennlp.tools.namefind'],
    ['Parser', 'opennlp.tools.parser.chunking'],
    ['Parse', 'opennlp.tools.parser'],
    ['ParserFactory', 'opennlp.tools.parser'],
    ['POSTaggerME', 'opennlp.tools.postag'],
    ['SentenceDetectorME', 'opennlp.tools.sentdetect'],
    ['SimpleTokenizer', 'opennlp.tools.tokenize'],
    ['Span', 'opennlp.tools.util'],
    ['TokenizerME', 'opennlp.tools.tokenize'],

    # Generic Java classes.
    ['FileInputStream', 'java.io'],
    ['String', 'java.lang'],
    ['ArrayList', 'java.util']
  ]

  # Add in Rjb workarounds.
  unless RUBY_PLATFORM =~ /java/
    self.default_jars << 'utils.jar'
    self.default_classes << ['Utils', '']
  end
Richard_G
  • 4,700
  • 3
  • 42
  • 78
  • classes.rb which is part of the open-nlp bindings to rjb has three calls to OpenNLP::Bindings::Utils.. Further diagnosis shows that all three of these calls are failing with some form of Java NoMethodError. I discovered this after trapping them all with rescue ArrayStoreException, NullPointerException, NoClassDefFoundError => e. Utils is the default namespace and utils.jar is the default jar. It seems the default bindings are not getting set up correctly. Asked Louis Mullie, the developer, for assistance but have not received a response as of yet. – Richard_G Oct 01 '13 at 19:04

2 Answers2

3

I don't think you're doing anything wrong at all. You're also not the only one with this problem. It looks like a bug in Utils. Creating an ArrayList[] in Java doesn't make much sense - it's technically legal, but it would be an array of ArrayLists, which a) is just plain odd and b) terrible practice with regard to Java generics, and c) won't cast properly to String[] like the author intends in getStringArray().

Given the way the utility's written and the fact that OpenNLP does, in fact, expect to receive a String[] as input for its tag() method, my best guess is that the original author meant to have Object[] where they have ArrayList[] in the Utils class.

Update

To output to a file in the root of your project directory, try adjusting the logging like this (I added another line for printing the contents of the input array):

try {
    File log = new File("log.txt");
    FileWriter fileWriter = new FileWriter(log);
    BufferedWriter bufferedWriter = new BufferedWriter(fileWriter);
    bufferedWriter.write("Tokens ("+objectArray.getClass().getSimpleName()+"): \r\n"+objectArray.toString()+"\r\n");
    bufferedWriter.write(Arrays.toString(objectArray));
    bufferedWriter.close(); 
}
catch (Exception e) {
    e.printStackTrace();
}
Josh
  • 1,563
  • 11
  • 16
  • Heh, just realized that you seem to be the one who submitted the bug report I linked to, so...never mind that. I still think the Java code smells a bit fishy. – Josh Oct 04 '13 at 20:53
  • NP, it's nice to have the attention. So far, all I hear is crickets... Still working on it though, and will look into your advice. I don't do Java, at least not yet. I may be forced into it, however. Thanks... – Richard_G Oct 05 '13 at 00:30
  • You don't do Java, and I don't do Ruby, but the fact that a NullPointerException is coming from a relatively simple Java method should make it not so bad to work out. What's the data type of `args[0]` that's being sent to the utility? – Josh Oct 05 '13 at 00:39
  • Actually, before you check that, can you do a quick check on `tagger` to make sure *it's* not what's null? OpenNLP initialization can be a little involved; it's possible something got mixed up in that process. – Josh Oct 05 '13 at 00:53
  • args is an array of one element, args[0]. args[0] is an array of 11 elements. Each element is one word of the 11 word sentence being processed. tagger is an instance of OpenNLP::POSTaggerME, as in #. It has three instance variables, @model==opennlp.tools.postag.POSModel@fc7ceb, @proxy_class==# and @proxy_inst==opennlp.tools.postag.POSTaggerME@6d3209. – Richard_G Oct 05 '13 at 09:34
  • OK; ruling out those variables themselves from being the culprits, try changing `ArrayList` to `Object` everywhere you see it in Utils.java. I can't imagine Ruby sending an array of `String`s over to Java as an array of `ArrayList`s. I don't know the specifics of how RJB works - whether you have to recompile anything after making those changes - but I'm guessing your dev environment is taking care of that for you. – Josh Oct 05 '13 at 16:52
  • I made the changes. The modified file and the new error messages are at the end of my original question. Is that what you wanted done? Thanks for the help, here. – Richard_G Oct 05 '13 at 20:35
  • So it's getting past the original failing line now? Somewhere in the chunk method, if I count the lines correctly. I might be out of my Ruby depth on this one; it claims to be storing the wrong type of object in an array, but unless the full stack trace is longer, I'm not sure where...could the return type of `chunk()` actually be a `String` array just like `tag()`? The comments seem to imply that it would be... – Josh Oct 06 '13 at 03:41
  • No, the additional lines were only from debug entries, rescues actually, so that I could see what was going on. The error occurred exactly as before, but the message changed due to the Utils.java edits. – Richard_G Oct 07 '13 at 00:07
  • Woops, just now noticed this - in the modified Utils.java, the line for `Object` should be `import java.lang.Object;` - normally it's not necessary at all to import `Object` in Java, but I have no idea what the Ruby bridge requires. Sorry I can't be of more direct help; if this fails with a similarly cryptic stack trace, I'm not sure quite where to look next (though I must admit, I've gotten somewhat addicted to trying to fix this one from the outside :)). – Josh Oct 07 '13 at 00:13
  • Okay, I made that change, it appears without noticeable effect. I added a series of three errors at the end of the question, starting at the bolded headline. I have taken to minimizing the additional rescues in case they are masking a problem that you might see. The first of the three sets of messages shows the errors didn't change, except line numbers where I pulled extraneous rescues from other locations. The second pulled rescues from the main routine. The third pulled rescues from the exact line that is failing. – Richard_G Oct 07 '13 at 00:56
  • Also, is there any Java code that I can insert either before the failing line or within it that would help us debug this? I have been working on RJB so that I have more experience inserting Java code or calling Java classes. – Richard_G Oct 07 '13 at 00:59
  • Sure; since my changes seemed to have hurt more than they helped, put Utils.java back the way it was, and add the following to the beginning of `tagWithArrayList()` (on the line right after the opening brace...sorry about the formatting): `System.out.println("Tokens: ("+objectArray.getClass().getSimpleName()+"): \n"+objectArray);` There's more we can do if this doesn't reveal anything; I'm beginning to wonder if I should edit the answer rather than adding comments, though... – Josh Oct 07 '13 at 01:09
  • Sure, if you add the information there it might be easier. In any case, I updated Utils.java as shown at the end of my question. I see no indication of changes in the run, however. I am not sure that it is directing to the console that I watch. Could you write the log to a file in the current directory? – Richard_G Oct 07 '13 at 01:41
  • That did nothing. No file was created. Here's what I think. Ruby/RJB is getting no where near Utils.java. The only reason the error messages changed was that I added rescues, so I masked the problem. Nothing changed in Utils.java has had any effect at all. So, it's back to square one for me. The bindings are screwed up and Ruby/RJB just isn't connecting the call at all. I think I need to go back to tracing the inner workings within RJB to find the break. Luckily, that's what I've been studying to do. What you have done has really helped me understand this! I'll be back. Thanks! – Richard_G Oct 07 '13 at 02:28
  • Does a JVM load filetype java files at all? If my classpath contains Utils.java and utils.jar which contains Utils.class, then RJB tries to load fqcn "Utils", which will it find? Utils.java and Utils.class differ, and I think we need Utils.java, which would be easier anyway. – Richard_G Oct 07 '13 at 05:27
  • I added the cavaj of Utils.class to the end of the question. This and Utils.java are both delivered with open-nlp in the same folder. And, they are different. What should be done? – Richard_G Oct 07 '13 at 05:54
  • UPDATE: We need to discuss this. I compiled Utils.java into Utils.class then decompiled it to find that that is basically the same as what I posted above. I tried two different decompilers. So, the code should be equivalent? – Richard_G Oct 07 '13 at 06:46
  • I should've caught this sooner...it looks like the Ruby binding is set up to use the .jar file, and if that's the case, without re-packaging your changes to Utils into a new utils.jar, they'll be meaningless. That said, the output from the decompiler you ran is, I believe, functionally identical to the source; the syntax has just been muddled a bit in translation. Discussing this on chat would probably be a better idea. I'm not sure quite how the chat system on SO works when a comment discussion is moved over there, but I'll be back sometime this evening, and we can try it. – Josh Oct 07 '13 at 13:26
  • Chatting would be great, if we can work this out. Let me know when would be good so I can be sure to be available. I have been able to compile and compress Utils.java now, but that caused a new series of errors during opennlp load, which is much earlier than I saw before. I updated the question. Look for the bold comment "Utils.java in use as of 10/07, compiled and compressed into utils.jar:" for the discussion. Thanks. OBTW, I awarded the bounty, which was ending, due to your support. We'll get this fixed... – Richard_G Oct 07 '13 at 16:52
  • Thanks for the extra rep :) - I'll probably be around 7:30-ish EST tonight, but it might take me a bit to set up my environment...it's probably time I installed Ruby anyway. – Josh Oct 07 '13 at 20:09
  • I'll be available. I don't know how chat works when it is automatically moved for sure, either. I've seen a lot of private chat rooms, so it's probably that? Also, see my very latest update and the end of the question titled: Later update on 10/07. Thanks. – Richard_G Oct 07 '13 at 20:43
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/38749/discussion-between-space-pope-and-r-g) – Josh Oct 07 '13 at 23:33
  • I am going to switch to chat mode. I assume it will leave pecker tracks here for us to follow. I guess we'll see... – Richard_G Oct 07 '13 at 23:33
3

SEE FULL CODE AT END FOR THE COMPLETE CORRECTED CLASSES.RB MODULE

I ran into the same problem today. I didn't quite understand why the Utils class were being used, so I modified the classes.rb file in the following way:

unless RUBY_PLATFORM =~ /java/
  def tag(*args)
    @proxy_inst.tag(args[0])
    #OpenNLP::Bindings::Utils.tagWithArrayList(@proxy_inst, args[0])
  end
end

In that way I can make the following test to pass:

sent   = "The death of the poet was kept from his poems."
tokens = tokenizer.tokenize(sent).to_a
# => %w[The death of the poet was kept from his poems .]
tags   = tagger.tag(tokens).to_a
# => ["prop", "prp", "n", "v-fin", "n", "adj", "prop", "v-fin", "n", "adj", "punc"]

R_G Edit: I tested that change and it eliminated the error. I am going to have to do more testing to ensure the outcome is what should be expected. However, following that same pattern, I made the following changes in classes.rb as well:

def chunk(tokens, tags)
  chunks = @proxy_inst.chunk(tokens, tags)
  # chunks = OpenNLP::Bindings::Utils.chunkWithArrays(@proxy_inst, tokens,tags)
  chunks.map { |c| c.to_s }
end

...

class OpenNLP::NameFinderME < OpenNLP::Base
  unless RUBY_PLATFORM =~ /java/
    def find(*args)
      @proxy_inst.find(args[0])
      # OpenNLP::Bindings::Utils.findWithArrayList(@proxy_inst, args[0])
    end
  end
end

This allowed the entire sample test to execute without failure. I will provide a later update regarding verification of the results.

FINAL EDIT AND UPDATED CLASSES.RB per Space Pope and R_G:

As it turns out, this answer was key to the desired solution. However, the results were inconsistent as it was corrected. We continued to drill down into it and implemented strong typing during the calls, as specified by RJB. This converts the call to use of the _invoke method where the parameters include the desired method, the strong type, and the additional parameters. Andre's recommendation was key to the solution, so kudos to him. Here is the complete module. It eliminates the need for the Utils.class that was attempting to make these calls but failing. We plan to issue a github pull request for the open-nlp gem to update this module:

require 'open-nlp/base'

class OpenNLP::SentenceDetectorME < OpenNLP::Base; end

class OpenNLP::SimpleTokenizer < OpenNLP::Base; end

class OpenNLP::TokenizerME < OpenNLP::Base; end

class OpenNLP::POSTaggerME < OpenNLP::Base

  unless RUBY_PLATFORM =~ /java/
    def tag(*args)
        @proxy_inst._invoke("tag", "[Ljava.lang.String;", args[0])
    end

  end
end


class OpenNLP::ChunkerME < OpenNLP::Base

  if RUBY_PLATFORM =~ /java/

    def chunk(tokens, tags)
      if !tokens.is_a?(Array)
        tokens = tokens.to_a
        tags = tags.to_a
      end
      tokens = tokens.to_java(:String)
      tags = tags.to_java(:String)
      @proxy_inst.chunk(tokens,tags).to_a
    end

  else

    def chunk(tokens, tags)
      chunks = @proxy_inst._invoke("chunk", "[Ljava.lang.String;[Ljava.lang.String;", tokens, tags)
      chunks.map { |c| c.to_s }
    end

  end

end

class OpenNLP::Parser < OpenNLP::Base

  def parse(text)

    tokenizer = OpenNLP::TokenizerME.new
    full_span = OpenNLP::Bindings::Span.new(0, text.size)

    parse_obj = OpenNLP::Bindings::Parse.new(
    text, full_span, "INC", 1, 0)

    tokens = tokenizer.tokenize_pos(text)

    tokens.each_with_index do |tok,i|
      start, stop = tok.get_start, tok.get_end
      token = text[start..stop-1]
      span = OpenNLP::Bindings::Span.new(start, stop)
      parse = OpenNLP::Bindings::Parse.new(text, span, "TK", 0, i)
      parse_obj.insert(parse)
    end

    @proxy_inst.parse(parse_obj)

  end

end

class OpenNLP::NameFinderME < OpenNLP::Base
  unless RUBY_PLATFORM =~ /java/
    def find(*args)
      @proxy_inst._invoke("find", "[Ljava.lang.String;", args[0])
    end
  end
end
Richard_G
  • 4,700
  • 3
  • 42
  • 78
André
  • 1
  • 1
  • 1
  • This answer looks very good. There may be other incidences of these calls that need updated, too. Let me check the results. Thanks! – Richard_G Oct 08 '13 at 20:30