I am currently working on a project where we have a set of events. One piece of analysis we do on the events is to look through a specific type of event and check to see if it's likely that it was prompted by another event which happened shortly before (or slightly after in one odd case). Each of these events can only be effected by a single event, but one event could be the causal event for multiple events. We want this association to go both ways so that, from any particular method, you can go straight to the event which caused it, or one of the events which it caused. Based on that, I started by adding the following properties to the Event objects and adding a funct:
protected Event causalEvent;
protected List<Event> effectedEvents;
After a bit of thinking, I considered that we never want the same item added twice to the effectedEvents
list. After reading the answer to Preventing Duplicate List<T> Entries, I went with a Hashset.
protected Event causalEvent;
protected HashSet<Event> effectedEvents;
A co-worker and I got to discussing the code I'd added and he pointed out that using a HashSet
might confuse people since he tended to see a HashSet
and assume that there's a great deal of data. In our case, due to the rules being used in the algorithms, effectedEvents
is going to have 0 items in about 90% of the cases, 1 item in 9%, and 2 maybe 1% of the time. Almost never will we have more than 2 items, although it is possible. I believe the lookup cost is the same for both collections. The amount of memory used is very similar since both start assuming a small capacity (although, I will concede that List
gives you the ability to set that capacity in the constructor while HashSet
only allows one to trim the value down based on its contents, "rounded to an implementation-specific value").
So, long question short, is there any real penalty to using a HashSet
other than possible confusion for those unfamiliar with using it to ensure uniqueness?