I am running a real life scenario in our database and thus I have no room to alter its structure in case that this comes around as a suggestion. Here is the Structure in question:
EntityA {
@Id
.......
@OneToMany(mappedBy = "xxxx", fetch = FetchType.LAZY)
private Set<EntityB> entityB;
@Column(name = "partition", columnDefinition = "nchar", length = 3)
private String partitionKey;
}
EntityB {
@Id
..........
@Column(name = "partition")
private String partitionKey;
@ManyToOne(fetch = FetchType.LAZY) //EAGER is the default for the to-one
@JoinColumn(name = "bbbb")
private EntityA entityA;
@OneToMany(mappedBy = "EntityCPk.entityB", fetch = FetchType.LAZY)
private Set<EntityC> entityC;
@OneToOne(mappedBy = "cccc", cascade = CascadeType.ALL)
@PrimaryKeyJoinColumn
private EntityD entityD;
}
EntityC {
@EmbeddedId
private EntityCPk entityCPk;
@Embeddable
public static class EntityCPk implements Serializable {
@Column( name = "partition")
private String partitionKey;
@ManyToOne
@JoinColumn(name = "xxxx")
private EntityB entityB;
@ManyToOne
@JoinColumn(name = "xxxx")
private EntityE entityE;
}
}
EntityD {
@id
.........
@MapsId
@OneToOne
@JoinColumn(name = "zzzz", columnDefinition = "nchar")
@PrimaryKeyJoinColumn
private EntityB entityB;
}
EntityE {
@id
.........
@OneToMany(mappedBy = "yyyPk.entityE")
private Set<EntityC> entityC;
}
Now the requirement is to run a query in one go with joins and avoid 1+N scenarios. I assume as far as I have seen that the .LAZY or EAGER annotations are "overwritten" when using the Query annotation within the repository along with the FETCH option. So here is what I have achieved so much (entityXXX and entityYYY do not interfear with our case so I just mention them):
First Attempt with FetchType.LAZY in EntityB.entityC property:
"select a" +
"from EntityA a " +
"join FETCH a.entityB bs" +
"join FETCH bs.entityXXX as xxx " +
"join FETCH bs.entityYYY as yyy " +
"join FETCH bs.entityD latest " +
"where a.aProp in ( :props ) " +
"and xxx.id = 4 " +
"and yyy.id = 10" +
"and latest.entityB.aProp between :date and :date "
Results, as expected. I get 1 query BUT I get no collection returned in EntityB.entityC due to the lazy annotation and of course it is not present in the query. If I change the the EntityB.entityC to FetchType.EAGER then I get as expected 3 queries. One is the main and N per entityC in Set. So I guess the next step is to join entityC:
Second Attempt:
"select a " +
"from EntityA a " +
"join FETCH a.entityB bs" +
"join FETCH bs.entityXXX as xxx " +
"join FETCH bs.entityYYY as yyy " +
"join FETCH bs.entityD as latest " +
"join bs.entityC as cs " + //please note I am not using FETCH yet
"where a.aProp in ( :props ) " +
"and c.entityCPk.partitionKey = a.partitionKey " +
"and xxx.id = 4 " +
"and yyy.id = 10" +
"and latest.entityB.aProp between :date and :date "
The result is unexpected and I think it has been reported here as well. What I get now is multiples of a(s) all references to the same object equals to the sum of the amount of bs.entityC. So if for example a-1 -> has 1-bs -> has 17 cs and a2 -> has 1-bs -> has 67 cs then I end up with a result set of 84 a objects all the same! This is question one. Why is this happening?
Question 2 is that if I use the FETCH in my new join then I am still not getting my 1 query and now I am not getting exactly multiple instances of A but multiple instances of some kind of Wrappers with a handler property that has references to A makred as EntityA_$$_jvstbc0_4@.
Just to give some insight to the database structure, I am more than sure that this schema started as a many-to-many relationship with a lookup table being EntityB between EntityA and EntityC. I may try to tackle the issue using JoinTable on EntityC joining on partitionKey and id of EntityB while EntityA has the partitionkey and its Id to map on EntityB. However i am not very hopeful of this solution as EntityB has been contaminated with other columns over time which needs to be selected uppon and I am not sure how can I do this.
UPDATE 1: I can see that when join FETCH is used for cs it is augmenting the resultant SQL select with the columns that are necessary i.e. to populate the cs children. Running the query manually I am getting correctly the sum of children as rows. Which makes sense SQL wise but hibernate should have been able to aggregate the additional rows based on their properties. Right enough without the join FETCH I am getting only rows equals to the amount of a. So my second though is that somehow I need to instruct Hibernate to aggregate manually(?)
UPDATE 2: Change of strategy. Instead of starting following an SQL logic, we better have to answer to the following question: Which Class/Entity will give us the granularity we are looking for. In the previous examples we were starting from EntityA trying to limit its children to fit our expected results. However as it has been pointed, this is effectively a corruption of the objects. You cannot "limit" the children cause they all belong to the Entity and fetching a subset of them you run the risk of deleting(?) data. So the approach must be to get the children objects we are interested that point to the parent entities. That we don't alter the data. So here is a query that returns the correct amount of object. No distinct or inexplicable multiplicities:
"select c " +
"from EntityC c " +
"inner join c.EntityCPk.EntityB.EntityD latest " +
"join latest.EntityB.EntityXXX xxx " +
"join latest.EntityB.EntityYYY yyy " +
"join fetch c.EntityCPk.EntityB " +
"where latest.EntityB.EntityA.Id in ( :param ) " +
"and latest.EntityB.aField between :paramA and paramB "
So this seems to answer the issue of the multiplicity of the previous examples as every row is based on the "finer" child object that resolves its parent via the -ToOne relationship. Also there are no more dangerous aliases in the join fetch. There is only one more issue. It introduces a 1+N query problem for EntityB that I cannot get rid off.