Will GC be much slower if i have mulitple references for each of millions of my business objects?

Question

I have ~10 millions domain objects which has to stay for all application lifetime in memory but can be added or removed any time one by one. Main storage is HashMap<Long, MyDO>

My processing can be done in basic foreach loops but i can optimize some of operations by creating indexes via mapping some object fields like HashMap<String, ArrayList<MyDO>> which will reduce iteration count for 30-100x but processing more like 2-5x in total

So question is how much slower will be GC for ~10 millions long living objects if i have not one map i store them but 5 maps and thus creating like 5x times references to same objects?

UPD In short: Is it feasable to use generic java collections with boxed keys for indexes in case there are ~10M object with ~1K objects added/removed per second?

maaartinus · Answer 1 · 2014-04-05T09:26:41.683

There'll probably hardly any difference. Long living objects get promoted to the tenured area which gets collected only rarely. It takes a couple of generation till the promotion and until them they have to be copied from the Eden into the survival area. Here the number of links doesn't matter.

So question is how much slower will be GC for ~10 millions long living objects if i have not one map i store them but 5 maps and thus creating like 5x times references to same objects?

I'd say that the number of references as such doesn't count at all. But all the map entries are actually themselves objects. However, 10 millions doesn't sound like a big number.

UPD In short: Is it feasable to use generic java collections with boxed keys for indexes in case there are ~10M object with ~1K objects added/removed per second?

No idea, but you could avoid it using some primitive collection. Can't you simply try it out? There are three useful optimizations principles:

Don't do it!
If you do, don't do it now!
If you do, don't do it without measurement!

It may happen that the GC overhead will be negligible and you find yourself wasting time.

References get used to mark an object as "in use", but once an object gets marked, additional references do nothing. Of course they have to be inspected, but this overhead is to be counted for the referrer rather then referee. So if you create one million references to a single object, it's the million objects what costs you time and not the single object.

if number of references does not matter how objects in old gen get collected at all? shouldn't gc check for references to them prior to collection to determine whether they should be collected? and aren't objects in old gen gc considered being roots for objects in yong gen? — float_dublin, Apr 05 '14 at 09:12
1. In about the same way as in the new gen. The number of references is irrelevant, the fact that it's non-zero counts. 2. The GC has to scan all reachable objects 3. There's a special handling for references from old to new, see [here](http://stackoverflow.com/questions/19154607/how-actually-card-table-and-writer-barrier-works). — maaartinus, Apr 05 '14 at 09:29
let say we have two arrays Object a1[10] and its clone Object a2[10] wouldn't gc have to iterate them both? — float_dublin, Apr 05 '14 at 11:24
@float_dublin: Yes, you have two objects and therefore double work. The fact that their content is the same doesn't make it any worse. — maaartinus, Apr 05 '14 at 12:27

score 0 · Answer 2 · answered Apr 05 '14 at 23:54

I'm not sure if this is a case here but if you are really worried about the gc in this case and you'd like to better control behavior of your derived maps and thus their influence on the performance of the gc in my opinion you should take a look at the usage of different kind of references (strong, weak, soft, phantom) in java.

Also remember that pre-optimization is a root of all evil especially in programming.

Will GC be much slower if i have mulitple references for each of millions of my business objects?

2 Answers2

Linked