5

I am trying to solve producer consumer problem which is I/O intensive. Producer constants appends data to a file and consumer reads from this growing file. File size is usually in GB (around 10GB) .

Initially I tried BufferedOutputStream and BufferedInputStream to read and write data to file. It takes too much System CPU % like 30-40% (must be system calls for I/O) during bursts of data which comes at 9:30am.

Looking at Memory mapped files to make this faster.

    File fileToRead= new File("C:\\readThisFile.dat");
    FileChannel inChannel = new FileInputStream(fileToRead).getChannel();
    MappedByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0, inChannel.size()); 
    byte[] data = new byte[(int)inChannel.size()];
    buffer.get(data);

1) Since file readThisFile.dat size() is more than INTEGER.MAX length inChannel.map() throws an exception.

2) How can consumer constantly read data using memory mapped files of extremely large files. consumer can load may be 100MB each time and keep looking for more data?

3)Is there a faster solution like trying something other than memory mapped files in Java?

Srujan Kumar Gulla
  • 5,721
  • 9
  • 48
  • 78
  • 3
    Why would you create a memory mapped buffer, and then immediately copy its contents to a byte array? The whole point of a mapped buffer is to avoid copying data to a Java array. Even if this worked, it would be slower than reading directly to the array. For a file that large, you should use stream processing. Why does the whole file need to be loaded in memory at once? – erickson May 27 '16 at 18:21
  • @erickson The whole file doesn't need to be in memory at once. How does the consumer read the contents of the file if it doesnt call a get() method. Stream processing, we are still on Java 6 and I this Streams are part of Java 8 API right? – Srujan Kumar Gulla May 27 '16 at 18:28
  • To add: The file literally can't be loaded into memory all at once unless you've got tons of spare RAM. You need to load it in piece by piece instead – Jeremy Kato May 27 '16 at 18:28
  • @JeremyKato Understood may I should have rephrased that point but yes the application can't read the entire file into memory. That is why I am looking for a solution where I loop continously. – Srujan Kumar Gulla May 27 '16 at 18:32
  • By "stream processing", I meant the general concept of loading, processing, and "forgetting" a minimal group of data before moving on to load and process the next. For example, maybe the file contains relatively small log events, each of which can be processed independently. No need to load the whole file in that case. As for not calling `get()`, you might call `get()`, `getInt()`, etc. as needed, of course. But if you simply call `get(byte[])` right off the bat, and *then* proceed to decode data into simpler data types, you've wasted the mapping. – erickson May 27 '16 at 18:38
  • If you are looking for a solution where you process the stream-wise, we can't help you much without understanding more about the file format. The information in the question currently isn't very relevant if that's your primary aim. – erickson May 27 '16 at 18:39
  • 4
    Java is a poor tool for trying to do what you're doing - reading a huge file as it's being written. Java abstracts away OS specifics and generally buffers IO - both of which get in the way of reading from a file as it's growing. To read from a growing file you need to know exactly how big it is at the moment you're reading from it. That is an OS-specific operation, and the Java JVM's tendency to cache things really impairs that. This question is highly relevant: http://stackoverflow.com/questions/32319031/zero-length-read-from-file – Andrew Henle May 27 '16 at 19:17
  • Is your problem that you don't know how to avoid reading beyond the end of a file that is currently being appended by another process, as Andrew describes? Or is your problem that you want to read large files more quickly? – erickson May 27 '16 at 20:03
  • @erickson Sorry erickson I was on vacation so couldn't comment right away. My problem is to know how to read large growing files quickly. Please let me know if any other efficient mechanisms than Memory mapped files. I am storing raw bytes in the file. – Srujan Kumar Gulla May 31 '16 at 18:42
  • @SrujanKumarGulla Did you solve your problem yet? What are you doing with the data that you read? Copying to new files? Parsing and creating some objects? Analyzing and aggregating data? – erickson Jun 04 '16 at 16:55
  • @erickson I looked at open hft chronicle api and memory mapped files. Chronicle is too complicated and their API keeps changing often making a hard maintenance. Memory mapped files was taking too much System CPU (since it loads 10MB of memory from file every milli sec) which is heavy IO. Please suggest me if you have any better solution. Thanks – Srujan Kumar Gulla Jun 06 '16 at 14:40

0 Answers0