1

I am receiving a GC Overhead Limit exceeded error when populating a hashmap with >100000 objects.

When my program starts, it reads from a CSV file key:value pairs. It then builds a hashmap which contains a string as the key and a hashset of objects for each value.

Its convenient to keep this method because at the end, I print statistics based on these mappings.

I see a few options: - Reduce object size. Will reduce the problem but may persist with more objects. - configure default map size and load factor. Same as above. - Increase heap size. Same as above. - Process objects sequentially and discard. Will fix problem but will lose mapping of objects. - Offload storage to DB?

Greatly appreciate your thoughts.

John Smith
  • 465
  • 1
  • 4
  • 18
  • 1
    Did you already have a look into http://stackoverflow.com/questions/1393486/error-java-lang-outofmemoryerror-gc-overhead-limit-exceeded/? – Ravindra babu Apr 05 '16 at 18:38

1 Answers1

1

If you can increase the heap size, that would be the first and easiest step.


I would also try minimize the amount of garbage for the GC to collect during the initial population.

Try to set the initialSize to something close to what you expect the size to be. HashMap generates lots of garbage during re-hash/resize and should be avoided if possible. And since HashSet is backed by a HashMapyou should do the same with HashSet.

(Also worth noting, HashMap doubles the size which normally is not a problem but if you have a very large data-set that would take just over a resize threshold it could allocate double the amount of memory needed.)

According to HashMap´s javadoc you should use an initialSize that is expectedSize / loadFactor to ensure that no re-hash/resize will happen. (If you use the default loadFactor of 0.75 that would yield an optimal initialSize 33.33333% larger than expectedSize)


Finally, if the heap size requirements ends up being too big for your setup, the next step is too look at off-memory solutions like a memory cache from Ehcache which you can configure to have a backing persistence storage if the cache takes too much memory.

Or even as you said in you question, a database.

gustf
  • 1,959
  • 13
  • 20