0

I have a 10GB PDF file that I would like to break up into 10 files each 1GB in size. I need to do this operation in parallel, which means spinning 10 threads which each starts from a different position and read up to 1GB of data and write to a file. Basically the final result should be 10 files that each contain a portion of the original 10GB file.

I looked at FileChannel, but the position is shared, so once I modify the position in one thread, it impacts the other thread. I also looked at AsynchronousFileChannel in Java 7 but I'm not sure if that's the way to go. I appreciate any suggestion on this issue.

I wrote this simple program that reads a small text file to test the FileChannel idea, doesn't seem to work for what I'm trying to achieve.

package org.cas.filesplit;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.file.Path;
import java.nio.file.Paths;

public class ConcurrentRead implements Runnable {

    private int myPosition = 0;
    public int getPosition() {
        return myPosition;
    }

    public void setPosition(int position) {
        this.myPosition = position;
    }

    static final String filePath = "C:\\Users\\temp.txt";

    @Override
    public void run() {
        try {
            readFile();
        } catch (IOException e) {
            e.printStackTrace();
        }       
    }

    private void readFile() throws IOException {
        Path path = Paths.get(filePath);

        FileChannel fileChannel = FileChannel.open(path);
        fileChannel.position(myPosition);
        ByteBuffer buffer = ByteBuffer.allocate(8);
        int noOfBytesRead = fileChannel.read(buffer);

        while (noOfBytesRead != -1) {   
            buffer.flip();  
            System.out.println("Thread - " + Thread.currentThread().getId());
            while (buffer.hasRemaining()) {
                System.out.print((char) buffer.get());
            }

            System.out.println(" ");
            buffer.clear();         
            noOfBytesRead = fileChannel.read(buffer);
        }
        fileChannel.close();
    }   
}
AlexCon
  • 1,127
  • 1
  • 13
  • 31
  • 2
    Doing this in parallel will actually slow down the process because the disk will thrash. – OldCurmudgeon Aug 27 '15 at 15:51
  • Is this meant as some kind of exercise, or do you look for a productive solution? Because there surely should be some [library/API](http://www.oracle.com/technetwork/articles/java/compress-1565076.html) that does exactly that. – SME_Dev Aug 27 '15 at 15:51
  • for exercise purposes. for example how does hadoop reads big files and break them up into smaller files ? – AlexCon Aug 27 '15 at 17:05
  • I found the answer here: [http://stackoverflow.com/questions/11867348/concurrent-reading-of-a-file-java-preffered][1] [1]: http://stackoverflow.com/questions/11867348/concurrent-reading-of-a-file-java-preffered – AlexCon Aug 27 '15 at 20:13
  • DId it work as intended? – Ravindra babu Aug 29 '15 at 09:44
  • Possible duplicate of [Reading a single file with Multiple Thread: should speed up?](http://stackoverflow.com/questions/8809894/reading-a-single-file-with-multiple-thread-should-speed-up) – Raedwald Jan 20 '16 at 18:04

0 Answers0