13

I have a comparatively long file of unsigned integers (64 bits each, 0.47GB file) that I need to read and store in an array. After some brain racking I wound up using the type long, since everything in Java is signed (correct me if I'm wrong, please) and I couldn't think of a better alternative. Anyhow, the array only has to be sorted, so the precise values of the original numbers are not of the utmost importance. We're supposed to measure the efficiency of the sorting algorithm, nothing more. However, I came up against a brick wall when I actually came to reading the file (my code below).

public class ReadFileTest {
    public static void main(String[] args) throws Exception {
        String address = "some/directory";
        File input_file = new File (address);
        FileInputStream file_in = new FileInputStream(input_file);
        DataInputStream data_in = new DataInputStream (file_in );

        long [] array_of_ints = new long [1000000];
        int index = 0;

        long start = System.currentTimeMillis();

        while(true) {
            try {
                long a = data_in.readLong();
                index++;
                System.out.println(a);
            }
            catch(EOFException eof) {
                System.out.println ("End of File");
                break;
            }
        }

        System.out.println(index);
        System.out.println(System.currentTimeMillis() - start);
    }
}

It goes on and on forever, and I usually step out to have lunch while the programme's reading. All in all 20 minutes is the fastest I've achieved so far. A course mate bragged today that his programme read it in 4 sec. He's working in C++ and I know that C++ is faster than Java, but this is ridiculous. Could somebody, please, tell me what I'm doing wrong here. I can't blame it on the language or the machine, so it must be me. From what I can see, though, the Java tutorials use exactly the same class, i.e. DataInputStream. I also saw FileChannels being recommended a couple of times. Are they the only way out?

ROMANIA_engineer
  • 54,432
  • 29
  • 203
  • 199
user699149
  • 133
  • 1
  • 1
  • 5
  • 6
    Does your mates program also print everything to standard output? I bet most time goes there. Comment out the println in the read loop and try again. – Ingo Apr 08 '11 at 19:07
  • 2
    Also make sure you're using the same setup he is. If you're using a 5400 RPM HDD and he's using an SSD he's going to smoke you no matter what language you're using. – DHall Apr 08 '11 at 19:18
  • how many times you have your lunch everyday? (j/k) – asgs Apr 08 '11 at 19:22
  • Also, for you 0.47 GB file you might want to use a longer array. You might try to use `inputFile.getLength()/8` as the length of the array. – Paŭlo Ebermann Apr 09 '11 at 00:57

2 Answers2

17

You should use buffered input, something like:

new DataInputStream(
    new BufferedInputStream(
        new FileInputStream(new File(input_file))))
Jonathon Faust
  • 12,396
  • 4
  • 50
  • 63
Kyle Dewey
  • 690
  • 5
  • 8
  • 4
    Also, try with different sizes of buffers. Don't assume that the default buffer size is the best, especially since you are reading such a large number of bytes. – Kelly S. French Apr 08 '11 at 19:12
  • In general I haven't found that increasing the buffer above the default of `8192` to help much, even in native languages. Having _very small_ buffers of a few 10s or 100s of bytes is really slow, but once you hit 8192 you are probably getting 90% of the max performance or more. – BeeOnRope Feb 10 '17 at 20:41
2

Want to object of the file:

new ObjectInputStream(
    new BufferedInputStream(
        new FileInputStream(new File(file_name))))

More about difference

Community
  • 1
  • 1
jsingh
  • 1,256
  • 13
  • 24