3

Is there a generally-accepted way to return a large list of objects using Java EE?

For example, if you had a database ResultSet that had millions of objects how would you return those objects to a (remote) client application?

Another example -- that is closer to what I'm actually doing -- would be to aggregate data from hundreds of sources, normalize it, and incrementally transfer it to a client system as a single "list".

Since all the data cannot fit in memory, I was thinking that a combination of a stateful SessionBean and some sort of custom Iterator that called back to the server would do the trick.

So, in other words, if I have an API like Iterator<Data> getData() then what's a good way to implement getData() and Iterator<Data>?

How have you successfully solved this problem in the past?

hallidave
  • 9,579
  • 6
  • 31
  • 27

3 Answers3

2

Definitely don't duplicate the entire DB into Java's memory. This makes no sense and only makes things unnecessarily slow and memory-hogging. Rather introduce pagination at database level. You should query only the data you actually need to display on the current page, like as Google does.

If you actually have a hard time in implementing this properly and/or figuring the SQL query for the specific database, then have a look at this answer. For JPA/Hibernate equivalent, have a look at this answer.


Update as per the comments (which actually changes the entire question subject...), here's a basic (pseudo) kickoff example:

List<Source> inputSources = createItSomehow();
Source outputSource = createItSomehow();

for (Source inputSource : inputSources) {
    while (inputSource.next()) {
        outputSource.write(inputSource.read());
    }
}

This way you effectively end up with a single entry in Java's memory instead of the entire collection as in the following (inefficient) example:

List<Source> inputSources = createItSomehow();
List<Entry> entries = new ArrayList<Entry>();

for (Source inputSource : inputSources) {
    while (inputSource.next()) {
        entries.add(inputSource.read());
    }
}

Source outputSource = createItSomehow();

for (Entry entry : entries) {
    outputSource.write(entry);
}
Community
  • 1
  • 1
BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
  • Thanks for the answer. The database was just an example. I'm not using a database and the information is not being displayed to a user. I'm basically aggregating data from a large number of sources and transferring it to another application. – hallidave Jan 05 '11 at 20:42
  • Then I would do it in a per-entry basis. Process/send immediately the single entry from input to output without holding the complete input in Java's memory. – BalusC Jan 05 '11 at 20:47
  • So what would the implementation look like? I was thinking the API would be "Iterator getData()", but wasn't sure of the best way to implement the Iterator. It sounds like you're talking about a message based solution? – hallidave Jan 05 '11 at 20:58
1

Pagination is a good solution when working with a web based ui. sometimes, however, it is much more efficient to stream everything in one call. the rmiio library was written explicitly for this purpose, and is already known to work in a variety of app servers.

jtahlborn
  • 52,909
  • 5
  • 76
  • 118
0

If your list is huge, you must assume that it can't fit in memory. Or at least that if your server need to handle that on many concurrent access then you have high risk of OutOfMemoryException.

So basically, what you do is paging and using batch reading. let say you load 1 thousand objects from your database, you send them to the client request response. And you loop until you have processed all objects. (See response from BalusC)

Problem is same on client side, and you'll likely to need to stream the data to the file system to prevent OutOfMemory errors.

Please also note : It is okay to load millions of object from a database as an administrative task : like for performing a backup, and export of some 'exceptional' case. But you should not use it as a request any user could do. It will be slow and drain server resources.

Nicolas Bousquet
  • 3,990
  • 1
  • 16
  • 18