1

I am coding a little java based tool to process mysqldump files, which can become quite large (up to a gigabyte for now). I am using this code to read and process the file:

BufferedReader reader = getReader();
BufferedWriter writer = getWriter();

char[] charBuffer = new char[CHAR_BUFFER_SIZE];
int readCharCout;
StringBuffer buffer = new StringBuffer();

while( ( readCharCout = reader.read( charBuffer ) ) > 0 )
{
    buffer.append( charBuffer, 0, readCharCout );
    //processing goes here
}

What is a good size for the charBuffer? At the moment it is set to 1000, but my code will run with an arbitrary size, so what is best practice or can this size be calculated depending on the file size?

Thanks in ahead, greetings philipp

philipp
  • 15,947
  • 15
  • 61
  • 106
  • 3
    Oracle's `BufferedReader` already uses a default buffer of `8192`. – Sotirios Delimanolis Sep 25 '13 at 14:31
  • I don't know if there's a standard for this as it'll depend on your available memory. I would recommend experimenting with it at different sizes to see how it affects your performance – StormeHawke Sep 25 '13 at 14:31
  • Or perhaps @SotiriosDelimanolis knows more about it than I do... lol – StormeHawke Sep 25 '13 at 14:32
  • @SotiriosDelimanolis how would you know default size of BufferedReader ? can you share link? – NFE Sep 25 '13 at 14:32
  • 1
    @StormeHawke AFAIK the best values are 4096 and 8192, which value to use depends entirely on your hard drive speed. – Luiggi Mendoza Sep 25 '13 at 14:32
  • For what it's worth, `StringBuffer` is a fairly heavyweight object. It's thread-safe and everything; it doesn't seem like the best option for... much of anything. It shouldn't be confused with an NIO buffer. – chrylis -cautiouslyoptimistic- Sep 25 '13 at 14:34
  • 1
    @SotiriosDelimanolis but then you get the wrong idea. Nor OpenJDK nor HotSpot source code are **the codes**. If the size is not specified in the javadoc, then you **must not** assume it will always be 8192, note that JRockit or IBM JVM can change this. – Luiggi Mendoza Sep 25 '13 at 14:34
  • @SotiriosDelimanolis I don't get like size but initial capacity of 16 characters. – NFE Sep 25 '13 at 14:37
  • @NFE Please rephrase that, I don't understand what you mean. – Sotirios Delimanolis Sep 25 '13 at 14:38
  • @SotiriosDelimanolis sorry, From where you said initially default buffer of 8192. I just want to know, I goggled but didn't get any link. – NFE Sep 25 '13 at 14:41
  • Please don't use StringBuffer, use StringBuilder instead. http://vanillajava.blogspot.com/2013/04/why-synchronized-stringbuffer-was-never.html – Peter Lawrey Sep 25 '13 at 14:43
  • 1
    @NFE Just like Luiggi Mendoza stated, it depends on the implementation. I was referring to Oracle JDK 7's implementation. The [OpenJDK also seems to use that size](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/BufferedReader.java). – Sotirios Delimanolis Sep 25 '13 at 14:43
  • I tried 4096 and 8192 what seems to be faster… What happens if the value becomes too high? – philipp Sep 25 '13 at 14:44
  • @SotiriosDelimanolis I am checking in jdk6. Thanks :) – NFE Sep 25 '13 at 15:00

1 Answers1

2

It should always be a power of 2. The optimal value is based on the OS and disk format. In code I've seen 4096 is often used, but the bigger the better.

Also, there are better ways to load a file into memory.

Community
  • 1
  • 1
mikeslattery
  • 4,039
  • 1
  • 19
  • 14
  • More than a power of 2, it should be a power of 1024. – Luiggi Mendoza Sep 25 '13 at 14:35
  • I tries a bunch of values which were a power of 2, all ran fine. But except for very small values I could not see any extraordinary performance gains, but I guess that is due to my implementation – philipp Oct 03 '13 at 07:36
  • I could be due to a lot of factors. Many OSes and disk controllers are smart enough to read blocks ahead of time. When that occurs, your buffer size doesn't matter much, except for the cost of round-tripping to the OS API. – mikeslattery Oct 03 '13 at 20:08