2

Is it worth taking a count of the results in SQL, and then passing back two values from the jdbc call (count, result) so that I can determine the sizes and preallocate arrays instead of using arraylists to hold the contents of the results?

It seems like there's no way to win: either I spend more time in the db counting the rows, or spend time in the java app when using arraylists.

Note:

  • My result set will have counts no more than, say, 100,000.
  • results will have mostly integers or doubles (no strings).
  • Preallocated arrays I assume are better because there's random access on these arrays later.
  • The number of arrays is about 3-4 times the number of columns returned.
  • These arrays will get accessed multiple times each.
  • my program will be called multiple times as a service, so the cumulative overhead matters from the point of scalability both from the space as well as time perspective. That is another reason I'm trying to use arrays.
  • I do not wish to start a debate on array vs. arraylists. I just wish to understand which of these will be more suitable in the context of my use.

Thanks.

KS1
  • 165
  • 1
  • 10
  • 1
    "Preallocated arrays I assume are better because there's random access on these arrays later." What makes you think that random access via ArrayList will be slow? Have you actually got a simple version of the code working and tested the existing performance? If not, that should be your first step. – Jon Skeet Dec 10 '14 at 12:03
  • 1
    arraylist internally uses a normal array. this is set to some initial size, that's called 'capacity' in arraylist fields (you can set this on your own during arraylist initialization). if you need more space, arraylist reallocates more space for the array (half of the previous capacity). it's simple and effective. you won't see any benefit with your 'array approach'. and I really think you could have written a test for this in 15 minutes. – MarianP Dec 10 '14 at 12:08
  • Don't bother using arrays - you can trust Java collections to do their jobs well. About 15 years ago I wrote primitive int implementations of all the java.util collections so I wouldn't have to box my ints in a number-centric application. The performance difference under load was so trivial I threw it all away. If the demands of your application actually warranted this discussion, you probably wouldn't be coding in Java. However, if your service includes pagination then you would have to get a count anyhow to determine number of pages... – gknicker Dec 10 '14 at 12:56
  • Thanks for the comments. I have not heavily used arraylists before, but reading up [link](http://docs.oracle.com/cd/A97688_16/generic.903/bp/java.htm#1007056), [link](http://stackoverflow.com/questions/19690728/performance-primitive-array-vs-arraylist), [link](http://stackoverflow.com/questions/716597/array-or-list-in-java-which-is-faster), etc., it seemed that there *might* be some advantage to using arrays for me. @MarianP, I wasn't sure how to test it in a server setup for scalabiity with multiple users (effects on mem. usage,gc and whether that will even matter), so asked this question. – KS1 Dec 11 '14 at 03:39

3 Answers3

4

It doesn't make sense to me to run an additional query just to get the count.

If you must use an arrays, you'd be better off first reading the data from the ResultSet to ArrayLists, and then creating arrays based on the size() of the ArrayLists and copying the data to them.

However, I don't think that access to ArrayLists is slower than access to arrays. After all, ArrayLists are backed by an actual array, and have the same random access :

Here's ArrayList::get:

public E get(int index) 
{
    RangeCheck(index);
    return (E) elementData[index];
}

The only thing that would slow you down when working with ArrayLists is if they will have to be resized multiple times while you are inserting data to them, but you only have this problem because you don't know the size of the data in advance, which is the precise reason why you can't use arrays.

Eran
  • 387,369
  • 54
  • 702
  • 768
3

The difference between filling a presized array with 100k elements and filling an arraylist with 100k elements is at most a few microseconds which is several orders of magnitude smaller than whatever interaction you are having with the database. So getting the count is most likely going to take more time than the speed difference between the two options.

So I would keep it simple and use an ArrayList.

UNLESS you are doing heavy computations with the doubles and integers, in which case using a primitive array (int[] or double[]) would make sense (but I would probably still use an ArrayList for filling data from the db and convert to an array before the computations).

assylias
  • 321,522
  • 82
  • 660
  • 783
2

As other have pointed out ArrayList is essentially an array in a fairly light wrapper. So you don't ask much of a question.

What is also mentioned above is you can pre-size ArrayList either at creation using the ArrayList(int initialCapacity) constructor or later using the ensureCapacity(int minCapacity) method.

It will generally be more efficient to ensure that capacity if establishing a reasonable capacity is not onerous. I say that because depending on how your result set is provided it may internally cursor through the whole data set to establish the result-set size and you might easily throw away all the gains getting the count! However if you just know the list will be large, ignorantly throwing 100000 at it will still probably be worth it. Try to slightly over rather than under estimate and use trimToSize() if you need the slack back at the end.

In most modern systems 100,000 is a large size but not huge. However if there are problems such as not being able to allocate a large contiguous block of memory in a long running system or resource bound process there's a simple balance of allocating 'chunks' (e.g. 10,000 elements) and indexing them with chunk = index/10000 and inChunkIndex= index%10000. Notice that the access costs of such an object are damn near a single contiguous block.

However such cleverness isn't something you should go to straight away.

Persixty
  • 8,165
  • 2
  • 13
  • 35