2

Java programs can be very memory hungry. For example, a Double object has 24 bytes: 8 bytes of data and 16 bytes of JVM-imposed overhead. In general, the objects that represent the primitive types are very expensive.

The same happens for any collection in the Java Standard Library. There are even some counterintuitive facts such as a HashSet being more memory hungry than a HashMap, since a HashSet contains a HashMap inside (http://docs.oracle.com/javase/7/docs/api/java/util/HashSet.html).

Could you come up with some advice when modeling data and delegation of objects in high performance settings so that these "weaknesses" of Java are mitigated?

Jadiel de Armas
  • 8,405
  • 7
  • 46
  • 62
  • 3
    Absolute numbers of bytes does not always translate into actual performance problems; it doesn't necessarily inflate CPU time that badly, only memory consumption. – Louis Wasserman Feb 19 '15 at 20:24
  • 2
    [This advice](http://stackoverflow.com/a/24375096/3448419) applies to your question as well. – apangin Feb 20 '15 at 23:09
  • @apangin, that is exactly the kind of advise I was looking for. Thanks a lot! – Jadiel de Armas Feb 21 '15 at 04:11
  • 1
    Everyone should read Jon Louis Bentley's classic 1982 book Writing Efficient Programs. (Out of print but find a copy.) [An online summary.](http://www.crowl.org/lawrence/programming/Bentley82.html) – philipxy Mar 06 '15 at 10:19
  • http://www.javabench.in/2016/11/while-developing-programming-in-java.html – Raúl Nov 10 '16 at 07:52

7 Answers7

4

Some techniques I use to reduce memory:

  • Make your own IntArrayList (etc) class that prevents boxing
  • Make your own IntHashMap (etc) class where keys are primitives
  • Use nio's ByteBuffer to store large arrays of data efficiently (and in native memory, outside heap). It's like a byte array but contains methods to store/retrieve all primitive types from the buffer at any arbitrary offset (trade memory for speed)
  • Don't use pooling because pools keep unused instances explicitly alive.
  • Use threads scarcely, they're super memory hungry (in native memory, outside heap)
  • When making substrings of big strings, and discarding the original, the substrings still refer to the original. So use new String to dispose of the old big string.
  • A linear array is smaller than a multidimensional array, and if the size of all but the last dimension is a power of two, calculating indices is fastest: array[x|y<<4] for a 16xN array.
  • Initialize collections and StringBuilder with an initial capacity chosen such that it prevents internal reallocation in a typical circumstance.
    • Use StringBuilder instead of string concatenation, because the compiled class files use new StringBuilder() without initial capacity to concatenate strings.
Mark Jeronimus
  • 9,278
  • 3
  • 37
  • 50
3

Depends on the application, but generally speaking

  • Layout data structures in (parallel) arrays of primitives

  • Try to make big "flat" objects, inlining otherwise sensible sub-structures

  • Specialize collections of primitives

  • Reuse objects, use object pools, ThreadLocals

  • Go off-heap

I cannot say these practices are "best", because they, unfortunately, make you suffer, losing the point why you are using Java, reduce flexibility, supportability, reliability, testability and other "good" properties of the codebase.

But, they certainly allow to lower memory footprint and GC pressure.

leventov
  • 14,760
  • 11
  • 69
  • 98
  • Good advise. What do you mean by going off-heap? – Jadiel de Armas Feb 20 '15 at 13:09
  • @JadieldeArmas `ByteBuffer.allocateDirect()`; `FileChannel.map()`; or higher-level things on top of it, for example OpenHFT stuff (see [Chronicle Map](https://github.com/OpenHFT/Chronicle-Map), [Chronicle Queue](https://github.com/OpenHFT/Chronicle-Queue)) – leventov Feb 20 '15 at 16:16
3

One of the memory problems that are easy to overlook in Java is memory leakage. Nicholas Greene already pointed you to memory profiling.

Many people assume that Java's garbage collection prevents memory leaks, but that is not actually true - all it takes is one forgotten reference somewhere to keep an object around in perpetuity. Paradoxically, trying to optimize your program may introduce more opportunities for memory leaks because you end up with more complex data structures.

One example for a memory leak if you are implementing, for instance, a stack:

Integer stack[];
stack = new Integer[10];
int stackPtr = 0;

// a few push operation on our stack.
stack[stackPtr++] = new Integer(5);
stack[stackPtr++] = new Integer(3);

// and pop from the stack again
--stackPtr;
--stackPtr;

// at this point, the stack is logically empty, but
// the Integer objects are still referenced by the array,
// and are basically leaked.

The correct solution would have been:

stack[--stackPtr] = null;
Kevin Keane
  • 1,506
  • 12
  • 24
2

If you have high performance constraints and need to use collections for simple types, you might take a look on some implementations of Primitive Collections for Java.

Some are:

Also, as a reference take a look at this question: Why can Java Collections not directly store Primitives types?

Luís Bianchin
  • 2,327
  • 1
  • 28
  • 36
1

Luís Bianchin already gave you a few libraries which implement optimal collections in Java. Nevertheless, it seems that you are specially concerned about Java collections' memory allocation. In that case, there are a few alternatives which are quite straight forward.

  1. Cache

You could use a cache to limit the memory the collection (the cache) can allocate. By doing that, you only load in main memory the most frequently used entries and you don't need to load the whole data set form disk/network/whatever. I highly recommend Guava Cache as it's very well documented and pretty mature.

  1. Persistent Collections

Sometimes a cache is not a solution for your problem. For example, in an ETL solution, you might know you will only load each entry once. For this scenario I recommend to go for persistent collections. These are disk stored collections that are way faster than traditional databases but have nice Java APIs. MapDB and PCollections are for me the best libraries.

  1. Profile memory usage

On top of that, if you really want to know the actual state of your program's memory allocation I highly recommend you to use a profiler. This way you will not only know how much memory you collections occupy, but also how the GC behaves over time.

In fact, you should only try an alternative to Java's collections and data structures if there is an actual memory problem, and that is something a profiler can tell you.

The JDK has a profiler called VisualVM which does a great job. Nevertheless, I recommend you to use a commercial profiler if you can afford it. The commercial profilers usually have a low impact in the application's performance when compared to VisualVM.

  1. Memory optimal data is nice with the network.

Finally, that it's not strictly related to your question, but it's closely connected. In case you want to serialize your Java objects into an optimal binary representation I recommend you Google Protocol Buffers in Java. Protocol buffers are ideal to transfer data structures thought the network using the least bandwidth possible and having a really fast coding/decoding.

Pau Carre
  • 471
  • 2
  • 5
1

Well there is a lot of things you can do.

Here are a few problems and solutions:

  1. When you change the value of a string in java, the string is not actually overwritten. Instead, a new string is created to replace the old one. However, the old string still exists. This can be a problem when using RAM efficiently is a concern. Here are some solutions to this problem:

    • When using a string to specify something like the "state" of an object or anything else that can only have a specific set of possible values, don't use a string. Instead use an enum. If you don't know what an enum is or how to use one yet, here's a link to a tutorial on what enums are and how to use them!
    • If you are using a string as a variable who's value will change at some point in the program, don't define a string how you usually would. Instead, use the StringBuilder class from the java.lang package. StringBuilder is a class which is used to create strings and change their values. This class handles strings differently than usual. When it is used to change the value of a string, StringBuilder doesn't create a duplicate string with a different value to replace the old string, it actually changes the value of the original string. Therefore, since you aren't creating duplicate strings, this saves RAM. Here is a link to to the StringBuilder class in the java api.
  2. Writer and reader objects such as fileWriters and fileReaders also take up RAM. If you have a lot of them, this can also cause problems. Here are some solutions:

    • All reader and writer objects have a method called close(). As you can probably guess, it closes the writer or reader object. All it does is get rid of the reader or writer object. Whenever you have a reader or writer object and you reach the point in your code when you know you will never use the reader or writer object anymore, use this method. It will get rid of the reader or writer object and will free some RAM.
  3. Every object in java takes up memory. When you have an object that you won't use anymore, it's not very convenient to keep it around.

    • The Object class has a method called finalize(). This method has the same effect as the close() method in reader and writer objects. When you aren't going to use an object anymore, use the finalize() method to get rid of it and free some RAM.
Nicholas Greene
  • 125
  • 2
  • 14
  • 1
    On point 3, finalize actually is more akin to a destructor in C++. You are not supposed to ever call it; the garbage collector will call it shortly before the object is collected. It is not very reliable, though; there is basically no good way to predict when or how this method is called (or even whether it is called at all). – Kevin Keane Mar 05 '15 at 09:35
-1

Beware of early optimisation. See When is optimisation premature?

While not knowing the exact requirements of your application or runtime environment, in my experience java was able to handle anything I threw it at. Doing some profiling on your demo /proof of concept app might be time well spent if performance or garbage collection (you tagged memory leaks) is an issue.

Community
  • 1
  • 1
Snymant
  • 21
  • 2