5

I'm trying to modify a serial program that reads from a database, and writes results to a file, this is done in a blocking way and I think we can get performance boost of there is a memory buffer and have the file being written in the "background" asynchronously

I can think of the "job interview" solution, using Threads, shared resource, syncronized blocks etc... but I'm sure there is a better way (is there a nice little "delayed write" library out there that will do it for me?)

Does any of the java.util.concurrent package offer any help? java.nio? or perhaps JMS/ActiveMQ?

What about PipedOutputStream / PipedInputStream as a basis for my buffer?

How do I implement a delayed / background / buffered / non-blocking / asynchronous file writter in Java?

Edit:

Based on the suggestions, and to avoid having this question closed, (as I think it's still relevant based on the answers comments and votes) here is an attempt to make it more focused. (I'm keeping the original question above so the answers will still remain in context, as there are some very good ones)

  • What are the practical differences between PipedOutputStream / PipedInputStream (or PipedReader / PipedWriter) and BlockingQueue
  • and is it preferable to use the latter for asynchronous file writing? (Or is this apples to oranges comparison and if so, I'd like to know why?)
Eran Medan
  • 44,555
  • 61
  • 184
  • 276
  • 1
    Your post is more of a discussion topic than a question, which is off-topic for SO. Try some of your suggestions and then post specific questions. – Jim Garrison May 02 '12 at 05:23
  • 2
    I am interested in a discussion, I respect this is off-topic for SO, but forgive me for again being off topic, where such questions are on-topic? I'm sure there are many like me that don't need SO for specific programming questions, I will figure it out code myself (someone asked it before 99% of times), but for - "out of million ways to do it, what is the best practice way", this is the real peer opinion I'm looking for, is this what http://programmers.stackexchange.com stands for? should I just move my question to there? I want to do the right job, not just do the job right – Eran Medan May 02 '12 at 06:02
  • 2
    Well I like this (kind of) question +1 – keuleJ May 02 '12 at 06:14

3 Answers3

5

You probably want to use a bounded blocking queue between the producer (the database reader) and the consumer (the file writer).

Java's ArrayBlockingQueue does the job quite nicely. The producer blocks if the buffer is full, avoiding any issues with consuming too much memory.

Doing the producing and consuming in concurrent threads is best achieved using Java's Executors framework.

K Erlandsson
  • 13,408
  • 6
  • 51
  • 67
  • This is the kind of answer I was looking for, I will try and use both BlockingQueue and Executors. I'm actually going to use the Writer snippet used for this answer http://stackoverflow.com/a/3604974/239168. By the way, is there a good reason why a BlockingDeque was used there and not a BlockingQueue? – Eran Medan May 02 '12 at 06:13
  • I haven't looked at the link, but in a standard producer consumer scenario there should not be any need for a dequeue over a queue. – K Erlandsson May 02 '12 at 06:25
2

I can think of the "job interview" solution, using Threads, shared resource, syncronized blocks etc... but I'm sure there is a better way (is there a nice little "delayed write" library out there that will do it for me?)

I've never come across a "delayed write" library. But I suppose that this really just an output stream / writer that writes to a queue or circular buffer with a private thread that reads from the queue / buffer and writes to the blocking output. This should work, but it might be difficult to avoid copying the cost of double-copying data.

Does any of the java.util.concurrent package offer any help?

Possibly.

java.nio?

Possibly.

or perhaps JMS/ActiveMQ?

I doubt it ... if your goal is to write to a local file.

What about PipedOutputStream / PipedInputStream as a basis for my buffer?

That could help. But you still need to implement the threads that read / write the streams.


Ignoring the mechanics of how you might implement this, I suspect that you won't get a significant speed-up by doing asynchronous I/O in this case. If you profile the application in its current form, you will probably find that the primary bottleneck is getting data from the database. Writing to the file is likely to be orders of magnitude faster. If that is the case, then overlapping database and file I/O is unlikely to give a significant speed-up.

And if file output does turn out to be a bottleneck, then a simpler way to speed it up would be to increase the output stream buffer size. This is a simple to do - just add an extra buffer-size parameter to the BufferedOutputStream constructor. You should try that before embarking on a major rewrite.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • This is a very helpful answer, the best piece of code is code you realize don't need to write. I thought initially that even if file IO is not the bottleneck, there will be at least some improvement if I don't wait for it, but I agree I need to profile first. I suspect that indeed file IO is orders of magnitude faster, this by itself would make this a very good answer, but your comment on BufferedOutputStream is priceless, another "obviously, but how didn't I think of it" moment – Eran Medan May 02 '12 at 06:29
  • 1
    Discussing other ways to speed it up, you should look in to increasing the fetch size for your SQL ResultSets if you are reading large sets of data. That can give huge speed boosts by avoiding chattiness in the DB communication. – K Erlandsson May 02 '12 at 06:32
  • Thanks, you mean by stmt.setFetchSize(n)? – Eran Medan May 02 '12 at 06:36
  • 1
    Yes, that works, or you could set it on the ResultSet when you have received it, both works. – K Erlandsson May 02 '12 at 06:45
  • Although this answer was the most useful to me personally I'll respect the votes, and the fact @KristofferE's answer is more fitting to the title of the question, and will accept his answer. But this is one of the most useful answers I got in SO. – Eran Medan May 02 '12 at 07:21
  • @KristofferE - setFetchSize did the trick, network latency to the DB with a low fetch size (default was 10) was most of the bottleneck (as Stephen accurately predicted). Now that this is resolved the IO + some processing percentage of the total time increased to about 30%, which might justify experimenting with async processing. Thanks again for taking the time to answer an almost closed question – Eran Medan May 02 '12 at 20:50
1

Java 7 has asynchronous I/O in 'NIO 2'.

If you can't use Java 7 I wouldn't bother trying to implement it at all frankly. You could do something horrible involving Futures but the benefit obtained would probably be zero if not negative.

user207421
  • 305,947
  • 44
  • 307
  • 483