2

We have a maven project with some files in the resources dir which get copied into the root of the jar file. I have the following bit of code which works fine during JUnit testing but stops working once I try to execute it from the jar

        Configuration configuration = new Configuration();
        String pathString = MainClass.class.getClassLoader().getResource("dir").getPath();
        Path path = new Path(pathString);

        logger.debug(path);
        FileSystem fs = path.getFileSystem(configuration);
        if (fs.exists(path)) {
            logger.debug("WOOOOO");
        } else {
            logger.debug("BOOOOO");
        }

While testing, the output is:

DEBUG: /path/to/project/target/test-classes/dir
DEBUG: WOOOOO

While running from jar I get:

DEBUG file:/path/to/jar/project.jar!/dir
DEBUG BOOOOO

Needless to say, the jar file is in the correct location and the dir is in the root of that jar.

In case you're wondering why we're doing this, the second half is little test excerpt, which mimics what NaiveBayesModel.materialize() in Mahout does. We just need to be able to create a path that Mahout will understand.

aiguofer
  • 1,887
  • 20
  • 34
  • Why do you need to get the path during runtime? – João Melo May 05 '14 at 22:31
  • @JoãoMelo So we can run it in the various environments (which are all set up slightly different) as well as running locally for testing. – aiguofer May 05 '14 at 23:11
  • The `Path` constructor instantiates a `URI` object. In the second case, have you tried inserting `jar:` in the beginning of the string? – João Melo May 06 '14 at 03:26
  • @JoãoMelo Thanks for the tip, I just tried it but then I get: java.io.IOException: No FileSystem for scheme: jar at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1408) – aiguofer May 06 '14 at 14:12
  • I was kind of expecting that. Just read an answer that clarifies this question about the `jar` scheme - http://stackoverflow.com/a/6247181/1033945. – João Melo May 06 '14 at 16:50

1 Answers1

1

The exception java.io.IOException: No FileSystem for scheme: jar means that you can't create a File object or open an FSDataInputStream(What Mahout does) with an URI that references something inside a jar object.

Schemes file and hdfs have FileSystem implementations, hence, I guess the only solution for you case, since you want to call NaiveBayesModel.materialize(), is to dump the files inside the dir directory of your jar into one of the two FileSystem that I mentioned and then create a Path from it.

In other hand, you can try to reproduce what Mahout does, which is the instantiation of a NaiveBayesModel.

I don't have experience with Mahout, but I guess it's a good point to start, hope it helps.

João Melo
  • 508
  • 5
  • 20
  • Well, I tried to go the route of reimplementing materialze(), which I was able to do successfully using getResourceAsStream and converting it to a DataInputStream. The problem is that along with the model I need to read in the labels, dictionary, and df-count... reimplementing all of them seems like a bad solution. I'm considering implementing a hadoop FileSystem that can read from jar files... I'm surprised this doesn't exist yet! – aiguofer May 06 '14 at 20:59
  • @aiguofer, do you find a solution? I guess implementing a FileSystem to read from jars isn't feasible. – João Melo Jun 20 '14 at 18:04
  • unfortunately I didn't have the time to try it. I imagine it's possible but it's non-trivial. We ended up just moving the files out and using a config file to point to their location. – aiguofer Jun 23 '14 at 14:54