13

I have a large (12GB) file and I need to extract small pieces of data (a few kilobytes each) from it, using Java. Seeking and reading the data, once the file is open, is very fast, but opening the file itself takes a long time - about 90 seconds. Is there a way to speed up the open file operation in Java?

To clarify, I've tried the following options to open and read a file:

new FileInputStream(file); 
new RandomAccessFile(file, "r");
Files.newByteChannel(path, StandardOpenOption.READ);

Each one of these yielded similar results.

tshepang
  • 12,111
  • 21
  • 91
  • 136
Little Bobby Tables
  • 5,261
  • 2
  • 39
  • 49
  • Which Operating-and Filesystem are you using? – Stefan Dec 10 '12 at 10:57
  • @Stefan Windows 7 and it's standard FS. – Little Bobby Tables Dec 10 '12 at 10:59
  • Have a look at the skip()-method of `InputStream`. I haven't tried this myself, hence only the comment, but if you know what portions you need, maybe it can help you to skip a certain part of the file when opening it. http://docs.oracle.com/javase/1.4.2/docs/api/java/io/InputStream.html#skip%28long%29 – Blacklight Dec 10 '12 at 11:01
  • Anybody knows if there is a way for memory-mapped files in java? – Fildor Dec 10 '12 at 11:04
  • @Blacklight skipping, setting position, etc. works fine and very fast. 90% of the time is spent just on opening the file, before the first skip. – Little Bobby Tables Dec 10 '12 at 11:05
  • @Fildor memory-mapping in Java is done using a MappedByteBuffer, that requires a file channel, that requires a file input stream, hence back to my first attempt. – Little Bobby Tables Dec 10 '12 at 11:08
  • @LittleBobbyTables Ok, I thought this could be a way to avoid the lag when opening, but haven't done it myself before. – Fildor Dec 10 '12 at 11:13
  • What line is exactly taking so long? Is it instantiating the File Object? Is it the instantiating the FileInputStream etc? Checking the Source code, the FileInputStream for instance, will simply query the security manager and immediately delegates to the JNI open method (http://kickjava.com/src/java/io/FileInputStream.java.htm). Nothing much you can do there, as the OS is taking over at this point. Did you try using alternative Filesystems? I will skip the obvious question partly: Why are you doing this :-) – Stefan Dec 10 '12 at 11:13
  • @LittleBobbyTables Ok sorry, maybe you can improve the speed by playing with buffer-sizes. I also found this question, have a look at the `java.nio` package instead of `java.io`: http://stackoverflow.com/questions/2356137/read-large-files-in-java – Blacklight Dec 10 '12 at 11:22
  • @Blacklight, BufferedInputStead extend the FileInputStream. So not solution there. – Damian Leszczyński - Vash Dec 10 '12 at 11:40
  • @LittleBobbyTables, could you be more specific with the point where the app wait for 90 secondsd ? – Damian Leszczyński - Vash Dec 10 '12 at 11:40
  • @Vash any of the constructors listed above takes approx 90 seconds to run when opening a 12GB file. The call to the ctr. itself stalls. – Little Bobby Tables Dec 10 '12 at 11:42
  • @Stefan As I said, the Ctr. is taking that long. That's an interesting point you're raising, because it seems that the JNI call takes that much time. – Little Bobby Tables Dec 10 '12 at 11:45
  • did you try `Scanner` to check open performance? – vishal_aim Dec 10 '12 at 11:46
  • @vishal_aim I checked the performance using good-old System.currentTimeMillis(). Link to Scanner? – Little Bobby Tables Dec 10 '12 at 11:53
  • @LittleBobbyTables, The ctr of InpusStream validate the input (File) and then call open(File). This a native method. Sou you could try to use alternative JDK or another file system or write a own JNI. Try to write an C code to open that file to perfomance test. – Damian Leszczyński - Vash Dec 10 '12 at 11:54
  • @Stefan it seems that Vash is right. I stepped into the lib code with the debugger and reached the native JNI open() function. The problem might be at the OS level. – Little Bobby Tables Dec 10 '12 at 11:59
  • @Vash is seems indeed that the native call is to blame, I'll test further. – Little Bobby Tables Dec 10 '12 at 12:00
  • I tried the example here: http://docs.oracle.com/javase/tutorial/essential/io/rafs.html and FileChannel.open() completes in a split second for a file of few gb. Have you tried that example? – Aksel Willgert Dec 10 '12 at 12:00
  • @AkselWillgert FileChannel.open() does not seem to differ from any other method for getting a file channel. – Little Bobby Tables Dec 10 '12 at 12:06
  • 1
    @Little Bobby Tables I ran a quick look at the FileChannelImpl. It seems to be delaying the native calls. So perhaps the performance problem is still there, just occurs at a later time? Did you try to open the file on a different operating system? Perhaps a Virus Scanner is intercepting the read to scan the file in an ad-hoc manner? – Stefan Dec 10 '12 at 12:10
  • Can you reproduce the problem with different jre or on different Computer? Which jre are you using now? – Aksel Willgert Dec 10 '12 at 12:10
  • @LittleBobbyTables, Good luck with tests. Im looking forward for results. – Damian Leszczyński - Vash Dec 10 '12 at 12:22
  • @Little Bobby Tables: just to try `Scanner(File source)` link:http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Scanner.html – vishal_aim Dec 10 '12 at 12:33
  • @vishal_aim I really don't see how a scanner is going to help. I'll still have to open the file, and I don't need text parsing, I know exactly to which offset to jump. – Little Bobby Tables Dec 10 '12 at 13:08
  • @AkselWillgert Problem occurs both on JDK 1.6 and 1.7. I don't have a different vendor JVM available right now. It persisted on a different machine. A different OS, unfortunately, is not a possible quick fix. – Little Bobby Tables Dec 10 '12 at 13:10
  • Maybe this is already in the comments and im missing something, but what is the minimal code that recreates this problem? – Aksel Willgert Dec 10 '12 at 13:13
  • 1
    @AkselWillgert just run any of the listed above constructors on a very large file (12GB, in my case) and measure the time, on a Windows OS. – Little Bobby Tables Dec 10 '12 at 13:32
  • @Stefan After much investigation it seems that the virus scanner is the main suspect, 99% sure. Can you please that as an answer, so I can accept it and give you credit? To be specific, the problem is that Java's open file operation triggers the OS operation that runs the virus scan, and the solution is to add Java to the list of trusted processes. – Little Bobby Tables Dec 10 '12 at 15:17

2 Answers2

10

From the commments: To be specific, the problem is that Java's open file operation triggers the OS operation that runs the virus scan, and the solution is to add Java to the list of trusted processes

Stefan
  • 990
  • 1
  • 6
  • 10
  • To clarify - Either add Java as a trusted process, exclude the large file from the on-access virus scan, or any other setting that keeps the virus scanner away. Thanks, Stefan! – Little Bobby Tables Dec 11 '12 at 07:46
1

The problem you have is mostly caused by JNI you are using.

As your code wait during constructor for FileInputSream(String). That veryfie the existance of passed path and call a method private native void open(String).

Then openJDK implementation of FileInputSream#open(String) look like:

    JNIEXPORT void JNICALL
    Java_java_io_FileInputStream_open(JNIEnv *env, jobject this, jstring path) {
        fileOpen(env, this, path, fis_fd, O_RDONLY);
    }

This move us to io_util_md.c and method

jlong winFileHandleOpen(JNIEnv *env, jstring path, int flags)

You can analyse the code there.


At this point you have various options.

  • Check the different JDK
  • Write a C code, to creaete onw JNI method.
  • Check the differnet file system.