79

Which of the following collection types do you use in your JPA domain model and why:

  • java.util.Collection
  • java.util.List
  • java.util.Set

I was wondering whether there are some ground rules for this.

UPDATE I know the difference between a Set and a List. A List allows duplicates and has an order and a Set cannot contain duplicate elements and does not define order. I'm asking this question in the context of JPA. If you strictly follow the definition, then you should always end up using the Set type, since your collection is stored in relational database, where you can't have duplicates and where you have define an order by yourself, i.e. the order in you Java List is not necessarily preserved in the DB.

For example, most of the time I'm using the List type, not because it has an order or allows duplicates (which I can't have anyway), because some of the components in my component library require a list.

Theo
  • 3,074
  • 7
  • 39
  • 54
  • I believe you might find the @OrderBy annotation useful and interesting. First link from Google about it: http://www.objectdb.com/api/java/jpa/OrderBy – Grzegorz Oledzki Jan 11 '11 at 08:35
  • @Grzegorz Oledzki I know the `@OrderBy` annotation, but it has nothing to with the order in your `List`. If you retrieve your entity list (which is annotated with `@OrderBy`), change its order, merge to the DB and retrieve it again, will the order be you changed be preserved? No! You will get the same order you've defined via `@OrderBy` – Theo Jan 11 '11 at 08:46
  • I agree this would be great. But you are half-way there. When you read such an entity, you'll get the proper ordering. – Grzegorz Oledzki Jan 11 '11 at 13:50
  • The @OrderColumn annotation maps to an order column in the database, specially used to preserve the order of elements in a List when you change them in memory. Downside: changing the order of an element will cause updates to potentially all rows in order to update the order column and keep it consistent with the order in memory. – German Apr 25 '13 at 17:41

6 Answers6

55

Like your own question suggests, the key is the domain, not JPA. JPA is just a framework which you can (and should) use in a way which best fits your problem. Choosing a suboptimal solution because of framework (or its limits) is usually a warning bell.

When I need a set and never care about order, I use a Set. When for some reason order is important (ordered list, ordering by date, etc.), then a List.

You seem to be well aware of the difference between Collection, Set, and List. The only reason to use one vs. the other depends only on your needs. You can use them to communicate to users of your API (or your future self) the properties of your collection (which may be subtle or implicit).

This is follows the exact same rules as using different collection types anywhere else throughout your code. You could use Object or Collections for all your references, yet in most cases you use more concrete types.

For example, when I see a List, I know it comes sorted in some way, and that duplicates are either acceptable or irrelevant for this case. When I see a Set, I usually expect it to have no duplicates and no specific order (unless it's a SortedSet). When I see a Collection, I don't expect anything more from it than to contain some entities.

Regarding list ordering... Yes, it can be preserved. And even if it's not and you just use @OrderBy, it still can be useful. Think about the example of event log sorted by timestamp by default. Artificially reordering the list makes little sense, but still it can be useful that it comes sorted by default.

Konrad Garus
  • 53,145
  • 43
  • 157
  • 230
  • 1
    Shouldn't you end up always using `Set` then? Because you are storing your entities in a relational database, where there can't be any duplicate elements and where you have to define an order by yourself (i.e. the order in you Java `List` is not preserved when it is persisted). And when do you use `Collection`? – Theo Jan 11 '11 at 08:11
  • 1
    If necessary, you can persist `List` order using an artificial field. In most cases, though, it is natural. One example can be some kind of a time-based log, where you can use @OrderBy("eventDate"). As for `Collection`, I would use it in similar situations as `Set`. – Konrad Garus Jan 11 '11 at 08:32
  • Yes, you can use and index column or you can use the `@OrderBy` annotation, but this order specified by these means is not related to order in your Java list. You can also use the `@OrderBy` annotation on a `Set` or use an index column. And if you change the order in your java list, the ordering in the DB will not be reflected in the DB. – Theo Jan 11 '11 at 09:05
  • See updated answer. In short, follow the exact same rules you use everywhere else in your application. – Konrad Garus Jan 11 '11 at 13:26
  • 3
    One case I have seen that muddies the water is JSF. It does not support the Set interface for entity collections, only List. This is fine on the view side but not storage (generally). So the view must be a List in this case, constantly translated to/from the Set unfortunately. – Darrell Teague Mar 10 '13 at 01:58
  • Great answer! I'd like to hear your take on how to implement hashCode/equals for entities. The problem I ran into is that even if I have a business identifier and use it to implement hashCode/equals, JPA providers like Hibernate will try and stick uninitialized entities into sets and thus the hashCode in that case will return a different value than after initialization. Is this acceptable? Should I return super.hashCode() if my business key wasn't initialized? Thanks! – Giovanni Botta Jul 26 '13 at 21:38
47

The question of using a Set or a List is much more difficult I think. At least when you use hibernate as JPA implementation. If you use a List in hibernate, it automatically switch to the "Bags" paradigm, where duplicates CAN exist.

And that decision has significant influence on the queries hibernate executes. Here a little example:

There are two entities, employee and company, a typical many-to-many relation. for mapping those entities to each other, a JoinTable (lets call it "employeeCompany") exist.

You choose the datatype List on both entities (Company/Employee)

So if you now decide to remove Employee Joe from CompanyXY, hibernate executes the following queries:

delete from employeeCompany where employeeId = Joe;
insert into employeeCompany(employeeId,companyId) values (Joe,CompanyXA);
insert into employeeCompany(employeeId,companyId) values (Joe,CompanyXB);
insert into employeeCompany(employeeId,companyId) values (Joe,CompanyXC);
insert into employeeCompany(employeeId,companyId) values (Joe,CompanyXD);
insert into employeeCompany(employeeId,companyId) values (Joe,CompanyXE);

And now the question: why the hell does hibernate not only execute that query?

delete from employeeCompany where employeeId = Joe AND company = companyXY;

The answer is simple (and thx a lot to Nirav Assar for his blogpost): It can't. In a world of bags, delete all & re-insert all remaining is the only proper way! Read that for more clarification. http://assarconsulting.blogspot.fr/2009/08/why-hibernate-does-delete-all-then-re.html

Now the big conclusion:

If you choose a Set instead of a List in your Employee/Company - Entities, you don't have that Problem and only one query is executed!

And why that? Because hibernate is no longer in a world of bags (as you know, Sets allows no duplicates) and executing only one query is now possible.

So the decision between List and Sets is not that simple, at least when it comes to queries & performance!

Ursin Brunner
  • 2,310
  • 1
  • 24
  • 24
8

I generally use a List. I find the List API far more useful and compatible with other libraries than Set. List is easier to iterate and generally more efficient for most operations and memory.

The fact that a relationship cannot have duplicates and is not normally ordered should not require usage of a Set, you can use whatever Collection type is most useful to your application.

It depends on your model though, if it is something you are going to do a lot of contains checks on, then a Set would be more efficient.

You can order a relationship in JPA, either using an @OrderBy or an @OrderColumn.

See, http://en.wikibooks.org/wiki/Java_Persistence/Relationships#Ordering

Duplicates are not generally supported in JPA, but some mappings such as ElementCollections may support duplicates.

kellyfj
  • 6,586
  • 12
  • 45
  • 66
James
  • 17,965
  • 11
  • 91
  • 146
4

I use:

  • Set: when the items in the collections have no order and are unique
  • List: when the items has a order
Ralph
  • 118,862
  • 56
  • 287
  • 383
  • 2
    Shouldn't you end up always using Set then? Because you are storing your entities in a relational database, where there can't be any duplicate elements and where you have to define an order by yourself (i.e. the order in you Java List is not preserved when it is persisted). And when do you use Collection? – Theo Jan 11 '11 at 08:23
  • 1
    What do you mean `can't be any duplicate elements`? Of course it can. You just have a primary key as Id field, and rest can be duplicate. – Shervin Asgari Jan 11 '11 at 08:29
  • 1
    @Theo Set equality on objects relies on the the equals method. DB equality relies purely on the primary key(s). These are not necessarily the same. – GaryF Jan 11 '11 at 08:47
  • @Shervin That depends on what you define as a duplicate. In most of the cases, or in my cases, respectively, if the primary keys are different, then it is not a duplicate. And I'm also using the primary key to implement `equals` and `hashCode`. – Theo Jan 11 '11 at 08:51
  • @Theo: the unique stuff depends a bit on: whether the data base primary key(s) == java equals implementation -- but in the most cases you are right. ---- But the order of an java list can preserved in the database! (@see Index Column) – Ralph Jan 11 '11 at 08:53
  • @GaryF That's true, and I'm aware of that. So far, I've always used the PK to implement `equals` and `hashCode` and my definition of duplicate is "two entities are equal if the PK is equal". I thought this the common case, but after looking at the comments, it looks like it's not that common... – Theo Jan 11 '11 at 08:54
  • @Ralph Yes, you need an additional column to preserve the order, but this, again, is not related to the order you have in your Java list. – Theo Jan 11 '11 at 08:57
  • "the order in you Java List is not preserved when it is persisted" ? Not true. In JPA2 you can preserve the order in the Java List .. indexed lists, as opposed to "ordered lists" (where you provide that order by clause) – DataNucleus Jan 11 '11 at 09:02
  • @Theo what are talking about: of course I can perserve the order of items in a list to the database: the JPA 2.0 annotation is called: OrderColumn (http://wiki.eclipse.org/EclipseLink/Examples/JPA/2.0/OrderColumns) – Ralph Jan 11 '11 at 09:08
  • @DataNucleus True that. But again, that is a different order. It's the order specified by the index (and not the list order). You could also have an index column in a collection of type `Collection`. In was looking for arguments when to use Collection, List and Set type in JPA associations. – Theo Jan 11 '11 at 09:09
  • 1
    @Theo: The list "index" is mapped to the database index column (and back) - so the order in the list is preserved. – Ralph Jan 11 '11 at 09:12
  • 1
    @Theo, no it isn't. When you define an "indexed list" in JPA2 the positions from the java.util.List are put into that artificial column, with origin 0. Consequently the order of the list is preserved ... which the whole point of transparent persistence. You use List when you need ordering, fact, irrespective of your persistence technology – DataNucleus Jan 11 '11 at 09:53
  • @Ralph OK, I should've read the doc on `@OrderColumn`. I thought it is your responsibility to update the index column, but apparently, this is done by the persistence provider. So, would you say that you use a `List` only if you use `OrderColumn` and for the rest of the cases you use `Set`? When do you use `Collection`? – Theo Jan 11 '11 at 09:58
  • @Theo: Yes, and I have never used Collection for any persistent class. -- BTW: (at least for Hibernate) there is an an other Collection Data Type: SortedSet – Ralph Jan 11 '11 at 10:23
2

https://issues.apache.org/jira/browse/OPENJPA-710

Choosing between List & Set should have an impact on how the query is sent to DB.

  • can you please expand your answer and briefly explain what is the impact? – Maksym Rudenko Dec 24 '20 at 14:46
  • http://openjpa.208410.n2.nabble.com/Removal-of-unnecessary-quot-Order-By-quot-clauses-td7315925.html I have seen the issue of OpenJPA adding an extra Order By clause to the generated query. This has a performance impact on Database for those queries which does not need any order. From the above link this should happen only when the collection is defined as a List. If it is defined as a Set then this Order By clause would be omitted. But unfortunately I could not get away from the Order By clause irrespective of what data type I choose. – Dyutiman Chaudhuri Dec 26 '20 at 16:39
0

I think using Collection as the generic default when generating entities with Netbeans is a good starting point, then when you figure out what your model actually is and need more functionality you can easily change it and stay backwards compatible.

AmanicA
  • 4,659
  • 1
  • 34
  • 49