3

If mapped file data is fully resident in physical memory will there be any benefit of reading data in parallel for example by defining number of sections with start/end byte and have separate thread working each section? The goal is to allow for frequent quick reads of data from big binary file.

I've been doing some tests (Java NIO) where each thread (testing with 4 threads) has access to reference of mmap but since each thread changes internal pointer in mmaped file to read next set of bytes this doesn't seem safe. I'm thinking about splitting a file into 4 mmaped chunks for each thread?

UPDATE: To give more context ultimately what I'm going after is having a data structure that will have a reference to number of mmaped files so then those references can be provided to some function that will do a loop scan testing for values and putting them into byte buffer.

UPDATE: This is for read-only files.

marcin_koss
  • 5,763
  • 10
  • 46
  • 65

1 Answers1

1

You can create different FileChannel for each thread. Each Thread will read a different part.

As documentation says, FileChannels are thread-safe.

Your code would be something like this

package nio;

import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;

public class HughTest {

    public static void main(String[] args) {

        try (FileChannel inChannel = new RandomAccessFile("file_Path", "r").getChannel()){

            // TODO Change in each thread the chunk size to read
            long fileSize = inChannel.size();
            ByteBuffer buffer = ByteBuffer.allocate((int) fileSize);
            inChannel.read(buffer);
            buffer.flip();
            // Do what you want

            inChannel.close();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }
}

This code reads a file in a unique thread, you would have adapt the code inside a runnable class and pass the FileChannel size in constructor or elsewhere to read the entire file in parallel as described in this question: Can I seek a file from different threads independently using FileChannel?

UPDATE

Unfortunately MappedByteBuffer is not thread safe as it is a subclass of Buffer, as you can see here: Does memory mapped file support concurrent get/put? so you have to use a synchronize mechanism in order to do it in parallel.

One approach would be copy the entire file in a temporal one (this way you ensure that the file will never be modified), and then use a runnable implementation like this

   private class ThreadFileRead implements Runnable {

        private final long ini;
        private final long end;

        public ThreadFileRead(long ini, long end) {
            this.ini = ini;
            this.end = end;
        }

        @Override
        public void run() {
            MappedByteBuffer out = null;

            try {
                out = new RandomAccessFile("FILEPATH", "r").
                        getChannel().map(FileChannel.MapMode.READ_ONLY, ini, end);

                for (long i = ini; i < end; i++)
                {
                    // do work
                }


            } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }

        }

    }
Community
  • 1
  • 1
Francisco Hernandez
  • 2,378
  • 14
  • 18
  • This doesn't seem to be using mmap. Wouldn't this approach require reading from disk each time "read" method is invoked? – marcin_koss Nov 22 '15 at 18:29
  • Yes... in each call this code will read from disk... Let me see if this can be combined with mmap – Francisco Hernandez Nov 22 '15 at 18:36
  • Thank you, this is helpful. I'm thinking in the end maybe it would be a good idea to divide file into as many mmaps as I want threads, store references to them and then each time I need to read, pass in those mmap references to runnable. Does this make sense? – marcin_koss Nov 22 '15 at 19:10
  • The problem with that approach would be that if a region of the file is changed by any program/thread, an unexpected exception will be thrown because the file changes are visible by mmap. You can see it here: http://docs.oracle.com/javase/7/docs/api/java/nio/MappedByteBuffer.html. If your file is never modified this approach is good, in any other case you will have unexpected behavior. If when you divide the file you copy it, you do not have those problems – Francisco Hernandez Nov 22 '15 at 19:17
  • Yes, I'm looking for a read-only solution. I will mention that in the question as well. – marcin_koss Nov 22 '15 at 19:20