20

Java experts emphasize the importance of avoiding premature optimization, and focusing instead on clean OO design. I am trying to reconcile this principle in the context of rewriting a program that uses a large array of long elements (a few million). It seems that using an ArrayList would consume about 3x the memory of a primitive array of longs, and wasting that much RAM seems like a legitimate concern to me.

I am basing this off an experiment I did using MemoryTestBench class described here. My test and output are as follows:

package memory;

import java.util.ArrayList;
import java.util.List;

public class ArrayListExperiment {

public static void main(String[] args) {

    ObjectFactory arrayList = new ObjectFactory() {
        public Object makeObject() {
            List<Long> temp = new ArrayList<Long>(1000);
            for (long i=0; i<1000; i++)
                temp.add(i);
            return temp;
        }
    };

    ObjectFactory primitiveArray = new ObjectFactory() {
        public Object makeObject() {
            long[] temp = new long[1000];
            for (int i=0; i<1000; i++)
                temp[i] = i;
            return temp;
        }
    };

    MemoryTestBench memoryTester = new MemoryTestBench();
    memoryTester.showMemoryUsage(primitiveArray);
    memoryTester.showMemoryUsage(arrayList);
}
}

and output:

memory.ArrayListExperiment$2 produced [J which took 8016 bytes
memory.ArrayListExperiment$1 produced java.util.ArrayList which took 24968 bytes

My question is: How can I reap the benefits of an OO List and still retain the small memory footprint of a primitive array? I think guava might provide the answer, but glancing through the API it's not obvious to me which class to use in place of ArrayList.

Thanks for any suggestions.

Jason Hall
  • 20,632
  • 4
  • 50
  • 57
Jonah
  • 15,806
  • 22
  • 87
  • 161
  • Is this MemoryTestBranch correct? I have shortly gone through the article and saw some interesting approaches like System.gc() one after another for a few times. – rit Dec 21 '11 at 06:43
  • 4
    Actually, this sounds about right. A Double costs a reference, plus the actual primitive double. At a bare minimum in a 64 bit JVM you'll pay twice the cost for Double than you will for a double, and there's some additional overhead. – rfeak Dec 21 '11 at 06:46
  • 1
    rit, I'm not qualified to comment on the cleanliness of the method in that article, but I do believe the memory results are correct – Jonah Dec 21 '11 at 06:46
  • What are you trying to do? ArrayList will always cost more because it only holds Objects (Double in your case). Since each Object/Double is really a reference, it will cost you an additional space in memory for each Double vs double. (1000 in your case) (reference + actual value of the double). – LazyCubicleMonkey Dec 21 '11 at 06:48
  • 1
    Its not uncommon to give a Java application 1 GB of memory. 3 million `Double`s in a list: ~ .07 GB. Seems like a small memory footprint :-) – SingleShot Dec 21 '11 at 06:51
  • @SingleShot, I'm not asking because it's physically impossible to solve the problem with ArrayList on my machine. The point is: Why should I have to waste 2x or 3x RAM just to get a nice OO wrapper around my arrays? – Jonah Dec 21 '11 at 06:59
  • My comment was tongue-in-cheek, but to answer your question, its not a "waste" - you get a nice OO wrapper that implements several interfaces, has an iterator, can be passed to algorithms that take in lists/collections on `Collections` and elsewhere. This makes your life easier and really, you aren't going to miss the RAM. If you disagree and do miss the RAM, you must give up something in order to save it - all that data in memory at once, or your algorithm efficiency, or your money for more physical RAM, or perhaps - your OO data structures. – SingleShot Dec 21 '11 at 07:55
  • If it's something to do with primitives (including primitive arrays) just remember to look in the `common.primitives` package in Guava. – ColinD Dec 21 '11 at 15:27
  • You could use http://code.google.com/p/memory-measurer/ to measure memory footprint much more directly (see for example some numbers I derived through it: http://code.google.com/p/memory-measurer/wiki/ElementCostInDataStructures) – Dimitris Andreou Dec 25 '11 at 03:20
  • Translating a double[] to List of size N: - Object[] uses 1 less word than double[] - the referenced Double object though needs 2 extra words (for the object header), plus 2 extra words for the double value itself. Thus if you naively translate a double[] to (a materialized) List, you incur a cost of 3N words (in a 32bit system, that's 12N bytes). Given that double[] is 2N words, going to 5N words constitutes a 150% footprint increase (if you were using 2 megabytes earlier, the new representation will be 5 megabytes). – Dimitris Andreou Dec 25 '11 at 03:25

6 Answers6

16

I think what you looking for in Guava is Doubles.asList

sanjary
  • 1,206
  • 1
  • 9
  • 10
  • 1
    Speaking as a Guava developer, this is indeed the way to do things, especially since, as you mention, your list size is fixed. – Louis Wasserman Dec 21 '11 at 11:52
  • @LouisWasserman: Does the Guava version provide anything that `Arrays.asList` doesn't? – Timothy Jones Dec 21 '11 at 12:42
  • 7
    @TimothyJones: `Arrays.asList` doesn't work for primitive arrays... pass in a `double[]` and you get a single element `List`. – ColinD Dec 21 '11 at 13:32
  • Indeed. A List is not what you want; you want a List. Arrays.asList doesn't work at all for your use case. – Louis Wasserman Dec 21 '11 at 15:32
  • @LouisWasserman, When you set elements of this list, is a Double object temporarily created and then destroyed, and only the primitive backing array permanently updated? – Jonah Dec 21 '11 at 15:48
  • @Jonah: You can [check the code](http://docs.guava-libraries.googlecode.com/git-history/v11.0/javadoc/src-html/com/google/common/primitives/Doubles.html#line.400), but yes if you passed a primitive `double` to `set` it would be autoboxed to match the parameter type and then unboxed again. – ColinD Dec 21 '11 at 15:53
  • Sadly, it appears that while the guava wrapper solves my RAM problem, doing memory lookups on the List is much slower than doing the equivalent array lookups on a long[] (and this is what I'll be using the List or array for). Here is my test showing a 2x speed increase doing lookups on the native array: http://pastebin.com/pH1LYbR9 – Jonah Dec 21 '11 at 20:23
  • 4
    @Jonah: Obviously there's going to be some performance difference, so if you _really, really_ need only the absolute fastest that you can possibly get in Java (maybe you do) then just using the array directly is the only option. The Trove option might also be a bit faster since it doesn't implement `List` and thus can use primitive types in its API. By the way, you should really be using `System.nanoTime()` for benchmarking. – ColinD Dec 22 '11 at 02:04
  • 3
    Agreed. If you need the List interface, Guava is the best you're going to find, if not, then you should possibly just use Trove or the array directly. – Louis Wasserman Dec 24 '11 at 17:21
11

You might consider using Trove, which provides support for primitive collections, for example the TDoubleArrayList class:

A resizable, array-backed list of double primitives.

Edit: It's true that this class doesn't implement List, but that's Java's price of avoiding boxed primitives. Guava's solution is the most versatile, while Trove is best for more extreme performance requirements.

Community
  • 1
  • 1
Paul Bellora
  • 54,340
  • 18
  • 130
  • 181
5

I think you are looking for FastUtil's DoubleArrayList - it's backed by a primitive array.

If your collection is REALLY big (larger than 2^31 elements) you may also want to look at their BigArrays

Timothy Jones
  • 21,495
  • 6
  • 60
  • 90
3

Write your own implementation of ArrayList that uses an array of primitives. Copy the current ArrayList code and replace the internal Object[] with a double[].

Should be a pretty straight foward copy and replace.

EDIT: Biggest danger to memory consumption is going to be the "grow". It will briefly take up at least twice the space, plus the additional room you grow. If you can't pre-size the array to avoid this, you may want to consider a slightly different implementation that uses multiple arrays as it grows over time. A bit more math on inserting and indexing, but shouldn't be tooooo bad.

rfeak
  • 8,124
  • 29
  • 28
1

Arrays.asList(T...) may be what you're looking for. It returns an instance of List<T> backed by the array passed to it.

mange
  • 3,172
  • 18
  • 27
  • 4
    He'd still pay the initial overhead of creating the Double[], which he appears to want to avoid. – rfeak Dec 21 '11 at 06:45
  • Keep in mind that the resulting List will be fixed-size, so you won't be able to call `add()` on it without errors. If you want to be able to add to the list then you'll need a regular `ArrayList`, with the memory overhead that entails. – Jason Hall Dec 21 '11 at 06:46
  • Jason, the list size is fixed and known in advance for my particular problem, fwiw – Jonah Dec 21 '11 at 06:48
  • @rfeak: You're completely right and this actually doesn't solve his problem at all. My bad. – mange Dec 21 '11 at 06:53
  • I actually think this might solve it. I ran the following test: http://pastebin.com/ZATz10vR. The "asList" method uses only 8064 bytes. I will leave the question up for a while longer to see other suggestions and thoughts. – Jonah Dec 21 '11 at 07:07
  • Well, that surprises me. It must be some sort of boxing magic. If it works for you: cool! – mange Dec 21 '11 at 07:38
  • 2
    @Jonah: Since you aren't actually assigning the result of `Arrays.asList` to anything in that code, you aren't noticing that the result is a `List` and not a `List`. – ColinD Dec 21 '11 at 13:34
  • ColinD, ah, ok that makes sense. Seems like guava is definitely the way to go – Jonah Dec 21 '11 at 15:36
1

It's a good question - performance vs code-cleanliness. I think you have grounds to be less concerned about clean OO design and simply focus on creating a good solution to the specific problem of working with a large array of longs. If you do so, keeping the performance-oriented code in one class/package will minimise its impact on overall design. Assumedly managing the large list of longs is only a small part of a bigger application...

Paul Jowett
  • 6,513
  • 2
  • 24
  • 19