How bad is to use Hashmaps and ArrayLists while using huge data?

Question

I am reading XML document into HashMaps, ArrayLists so that the relationship maintains even in the memory. My code does my job but i am worried about the iterations or function calls i am performing on this huge maps and lists. Currently the xml data i am working is not so huge. but i dont know what happens if it has. What are the testcases i need to perform on my logics that use these hashmaps? How bad is using a Java collections for such a huge data? Is there any alternatives for them? Will the huge data affect the JVM to crash?

I suggest you try generating large and huge XML data files and see what happens. When you get a idea of how large it can handle try using a profiler to see how you can make it more efficient. — Peter Lawrey, Dec 27 '11 at 10:43

score 12 · Answer 1 · answered Dec 27 '11 at 07:34

Java collections have a certain overhead, which can increase the memory usage a lot (20 times in extreme cases) when they're the primary data structures of an application and the payload data consists of a large number of small objects. This could lead to the application terminating with an OutOfMemoryError even though the actual data is much smaller than the available memory.

ArrayList is actually very efficient for large numbers of elements, but inefficient when you have a large number of lists that are empty or contain only one element. For those cases, you could use Collections.emptyList() and Collections.singletonList() to improve efficiency.
HashMap has the same problem as well as a considerable overhead for each element stored in it. So the same advice applies as for ArrayList. If you have a large number of elements, there may be alternative Map implementations that are more efficient, e.g. Google Guava.
The biggest overheads happen when you store primitive values such as int or long in collections, as the need to be wrapped as objects. In those cases, the GNU Trove collections offer an alternative.
In your case specifically, the question is whether you really need to keep the entire data from the XML in memory at once, or whether you can process it in small chunks. This would probably be the best solution if your data can grow arbitrarily large.
The easiest short term solution would be to simply buy more memory. It's cheap.

A very thorough answer to a very generic question. Thumbs up. — cherouvim, Dec 27 '11 at 09:11

score 2 · Answer 2 · answered Dec 27 '11 at 07:26

2

JVM will not crash in what you describe. What may happen is an OutOfMemoryError. Also if you retain the data in those Collections for long you may have issues with the garbage collection. Do you really need to store the whole XML data in memory?

answered Dec 27 '11 at 07:26

cherouvim

31,725
15
104
153

score 1 · Answer 3 · answered Dec 27 '11 at 07:31

If you are dealing with temporary data and you need to have a fast access to it you do not have to many alternatives. The question is what do you mean when you say "huge"? MegaBytes? GigaBytes? TeraBytes?

While your data does not exceed 1G IMHO holding it in memory may be OK. Otherwise you should think about alternatives like DB (relational or NoSql) files etc.

In your specific example I'd think about replacing ArrayList to LinkedList unless you need random access list. ArrayList is just a wrapper over array, so when you need 1 million elements it allocates 1 million elements long array. Linked list is better for when number of elements is big but it is rate of access of element by index is o(n/2). If you need both (i.e. huge list and fast access) use TreeMap with index as a key instead. You will get log(n) access rate.

Hmm. Bad advice. With current hardware, the limit should be about half a Terabyte (data and containing structure). That's what fits in a reasonably priced workstation. — Stephan Eggermont, Nov 16 '12 at 14:17

score 0 · Answer 4 · edited May 23 '17 at 11:50

What are the testcases i need to perform on my logics that use these hashmaps?

Why not to generate large XML files (for example, 5 times larger, than your current data samples) and check your parsers/memory storages with them? Because only you knows what files are possible in your case, how fast will they grow, this is the only solution.

How bad is using a Java collections for such a huge data? Is there any alternatives for them? Will the huge data affect the JVM to crash?

Of course, is it possible that you will have OutOfMemory exception if you try to store too much data in memory, and it is not eligible for GC. This library: http://trove.starlight-systems.com/ declares, that it uses less memory, but I didn't use it myself. Some discussion is available here: What is the most efficient Java Collections library?

score 0 · Answer 5 · answered Dec 27 '11 at 07:57

How bad is using a Java collections for such a huge data?

Java Map implementations and (to a lesser extent) Collection implementations do tend to use a fair amount of memory. The effect is most pronounced when the key / value / element types are wrapper types for primitive types.

Is there any alternatives for them?

There are alternative implementations of "collections" of primitive types that use less memory; e.g. the GNU Trove libraries. But they don't implement the standard Java collection APIs, and that severely limits their usefulness.

If your collections don't use the primitive wrapper classes, then your options are more limited. You might be able to implement your own custom data structures to use less memory, but the saving won't be that great (in percentage terms) and you've got a significant amount of work to do to implement the code.

A better solution is to redesign your application so that it doesn't need to represent the entire XML data structure in memory. (If you can achieve this.)

Will the huge data affect the JVM to crash?

It could cause a JVM to throw an OutOfMemoryError. That's not technically a crash, but in your use-case it probably means that the application has no choice but to give up.

How bad is to use Hashmaps and ArrayLists while using huge data?

5 Answers5