4

I am having a file of around 4MB, the file is an ascii file containing normal keyboard characters only. I tried many classes in java.io package to read the file contents as string. Reading them character by character (using FileReader and BufferedReader) takes approximately 40 seconds, reading the content using java.nio package (FileChannel and ByteBuffer) takes approximately 25 seconds. This is from my knowledge a little bit greater amount of time. Does someone knows any way to reduce this time consumption to somewhat around 10 seconds? Even solutions like creating file reader using C and calling from java will do. I used the below snippet to read the 4 MB file in 22 seconds-

public static String getContents(File file) {
    try {
        if (!file.exists() && !file.isFile()) {
            return null;
        }
        FileInputStream in = new FileInputStream(file);
        FileChannel ch = in.getChannel();
        ByteBuffer buf = ByteBuffer.allocateDirect(512);            
        Charset cs = Charset.forName("ASCII");          
        StringBuilder sb = new StringBuilder();
        int rd;
        while ((rd = ch.read(buf)) != -1) {
            buf.rewind();
            CharBuffer chbuf = cs.decode(buf);
            for (int i = 0; i < chbuf.length(); i++) {
                sb.append(chbuf.get());
            }
            buf.clear();
        }
        String contents = sb.toString();
        System.out.println("File Contents:\n"+contents);
        return contents;
    } catch (Exception exception) {
        System.out.println("Error:\n" + exception.getMessage());
        return null;
    }
}
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
Ranjan Sarma
  • 1,565
  • 5
  • 22
  • 36
  • why are you reading byte by byte? You know the size of the file - allocate a byte array of sufficient size to hold the contents of the file and read it completely using read(). – mcfinnigan Apr 10 '12 at 11:12

3 Answers3

5

I can't imagine what your hardware could be but it should take less than 0.1 seconds for a 4 MB file.

A fast way to read the file all at once is to read it into a byte[]

public static String readFileAsString(File file) {
    try {
        DataInputStream in = new DataInputStream(FileInputStream(file));
        byte[] bytes = new byte[(int) file.length()];
        in.readFully(bytes);
        in.close();
        return new String(bytes, 0); // ASCII text only.

    } catch (FileNotFoundException e) {
        return null;
    } catch (IOException e) {
        System.out.println("Error:\n" + e.getMessage());
        return null;
    }
}

public static void main(String... args) throws IOException {
    File tmp = File.createTempFile("deleteme", "txt");
    tmp.deleteOnExit();

    byte[] bytes = new byte[4 * 1024 * 1024];
    Arrays.fill(bytes, (byte) 'a');
    FileOutputStream fos = new FileOutputStream(tmp);
    fos.write(bytes);
    fos.close();

    long start = System.nanoTime();
    String s = readFileAsString(tmp);
    long time = System.nanoTime() - start;
    System.out.printf("Took %.3f seconds to read a file with %,d bytes%n",
            time / 1e9, s.length());
}

prints

Took 0.026 seconds to read a file with 4,194,304 bytes

If you want to read the file even faster, I suggest using a memory mapped file as it will take less than 10 milli-seconds, but that is over kill in this case.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • Yes you are right it takes very less time to read the file (088 ms) for 5 MB. But it takes a long time (40 sec) to display on console (System.out). What is the reason? Can someone explain ? I thought this 40 second was taken to read, but actually it takes <1 second to read but it takes upto 40 seonds to write to System.out? – Ranjan Sarma Apr 10 '12 at 11:24
  • 1
    The console is slow to update the screen. If you are using the MS DOS console, it is very slow. – Peter Lawrey Apr 10 '12 at 11:30
  • If you want to know why the MS-DOS console is slow, it because it hasn't changed much since it was first created. http://en.wikipedia.org/wiki/86-DOS – Peter Lawrey Apr 10 '12 at 11:32
  • System.out is incredibly slow. It is spooling to the operating system console. http://stackoverflow.com/questions/4437715/why-is-system-out-println-so-slow for a longer explanation – mcfinnigan Apr 10 '12 at 11:34
  • @mcfinnigan System.out is not slow when redirected to a file. Try `java MyClass > output.txt` and you will find its very fast. – Peter Lawrey Apr 10 '12 at 11:40
  • Be careful with `in.read(bytes)`. It may not fill the entire buffer as you would expect. – Tony the Pony Apr 10 '12 at 12:29
  • This is true of most stream, but not files. A better way to write it is using DataInputStream. Note: write() has the same problem. – Peter Lawrey Apr 10 '12 at 12:29
  • @PeterLawrey No. It is true of all streams, per the contract of `InputStream`, regardless of its actual type. In the case of FileInputStream, consider the case of a final block where the file size isn't a multiple of the buffer size. It isn't true of stream writes: they block as necessary until the buffer is empty. I think you meant 'a better way to *read* it is with `DataInputStream.readFully()`, and I would agree with that except for the problem of the final buffer. Your solution above doesn't work for file sizes > 32 bits and doesn't scale in general. – user207421 Apr 11 '12 at 03:28
  • @EJP, This reads the wrong way (double negative). It is true that read() will only read a small block of data for most streams and only readFully will read the entire buffer or throw an exception in all cases. The solution doesn't work for file size of >= 2^31, but fortunately most files are much smaller than this. For files of a significant size I would use a Memory Mapped File (with one or more buffers as required) – Peter Lawrey Apr 11 '12 at 05:52
  • @PeterLawrey For files of significant size I would *not* use memory mapped files, as there is no time at which the VM is ever released. There is no double negative in my comment. – user207421 Apr 11 '12 at 10:15
  • @EJP Do you mean; you can't guarentee when the MappedByteBuffers will be released? On the Sun/Oracle/OpenJDK you can, but only by using an internal API. :| – Peter Lawrey Apr 11 '12 at 10:37
  • @PeterLawrey That is correct. There is a very long-standing argy-bargy about this in a bug report. The MBB isn't released when the FileChannel it came from is closed, and according to the bug report there is no well-defined time at which it could be released, not even via GC. – user207421 Apr 11 '12 at 18:39
  • @EJP I am will to live with `((DirectBuffer) mbb).cleaner().clean();` but its definitely not for everyone. – Peter Lawrey Apr 11 '12 at 19:54
2
  1. There is no benefit in using direct byte buffers here.
  2. Your buffer size of 512 is too small. Use at least 4096.
  3. There is no real benefit to using NIO here. As this is text, I would use a BufferedReader.
  4. Your basic objective of reading the entire file into memory is flawed. It will not scale and it already uses excessive amounts of memory. You should devise a strategy for handling the file a line at a time.
user207421
  • 305,947
  • 44
  • 307
  • 483
1

You can increase your buffer size, say something to 2048 or 4096 Bytes.

Don't go with native API's as you won't get Java Features like Compile time type checking.

Serkan Arıkuşu
  • 5,549
  • 5
  • 33
  • 50
rohit
  • 602
  • 4
  • 11
  • 24