1

I use JPA (through Spring Data) with a postgress DB. I have a big entity, very central to my system. The entity has multiple associations (OneToMany, ElementCollection etc).

My typical usecase is returing a paginated list of those entities from my API. The API model needs information from the associated fields, so I must somehow fetch them for each entry of the current page.

Depending on the DB query, I use two ways to fetch the original list:

  1. Typical JPQL query from my spring repository (language is kotlin):
  @Query("select c from Company c where c.state = :state and :country member of c.targetCountries")
  fun findAllByCountryAndState(country: Country, state: CompanyState, page: Pageable): Page<Company>
  1. For more complex cases I use Criteria Query by extending JpaSpecificationExecutor.

My Question is what is the best way to combine pagination with fetching associated fields? The usecase seems pretty standard to me, but trying to implement this sends me from one problem to the next.

What I have tried so far:

  1. Carefully choose which associated fields are commonly needed in lists and mark them as eagerly fetched. The pagination works fine here, but unfortunately is that I run into N+1 problem. I could see tones of SQL queries being generated, one for each entity and associated field. So I followed the advice here and this leads me to:

  2. I used @NamedEntityGraph or join fetchs to fetch the needed associations from the very start. This worked great, but compromised my pagination. I got the following warning: HHH000104: firstResult/maxResults specified with collection fetch. This seemed not so optimal to me, so I tried to solve this using this solution. This mentions to either generate some massive Native queries for each usecase (a maintenance horror which I cannot afford), or to:

  3. use a 2-queries approach. First fetch the ids only (pagination, no join fetch) an then fetch the whole entities (join fetch, no pagination). This was a dead-end for me because JpaSpecificationExecutor does not support selecting of ids only). And I often need the flexibility of Criteria API where clauses. So I though of a slight variation of this and got:

  4. Fetch the whole entity in the first query without associations, then fetch the associated fields only in a seconadary query. "Glue" them together using the id and deliver the final result. The problem is that fetching the associated field only was not possible. I have to fetch the whole entity a second time, which seems like a big overhead, especially considering that the entity is pretty large.

  5. I also tried the select new MyFieldsOnlyPlease(c.id, MORE_FIELDS) from Company c join fetch c.SOME_ASSOCIATION approach, in which case I get the following error: query specified join fetching, but the owner of the fetched association was not present in the select list.

So my current solution is number 4, which seems clearly sub-optimal. To summarize: what is the best practices / best tradeoffs for pagination + associations? This usecase seems pretty standard to me, so I imagine there should be some good/elegant solution that I cannot see.

Alkis Mavridis
  • 1,090
  • 12
  • 28
  • 1
    There is no nice way to do this in JPA, so you'll have to resort to native provider specific behavior. I have not found a Hibernate equivalent, but when solving this on a similar project, I used EclipseLink's batchFetch annotation see https://www.eclipse.org/eclipselink/documentation/2.4/jpa/extensions/a_batchfetch.htm . This uses separate queries for relationships, but batches them all based on the original query or use an IN operation to limit to maybe only the 100 results returned in a page. I found this more efficient with large clobs in the model to avoid duplicating them over the network – Chris Nov 01 '22 at 14:50
  • @Chris that souds very promising! I just checked and figure out that there is a feature in Hibernate too called BatchSize (annotation). It reduces the 1+N to 1 + AmountOfBatches (using an IN clause). This is a massive improvement! I will try all usecases and write feedback. Thanks a lot – Alkis Mavridis Nov 01 '22 at 16:43
  • I can confirm that @BatchSize annotation covered all my usecases. It works perfectly with both eager and lazy loading – Alkis Mavridis Nov 01 '22 at 23:45

1 Answers1

1

Using two queries or nesting the first query as subquery into the main query is usually the way to go, but it's non-trivial to implement correctly. I would suggest you look into Blaze-Persistence for pagination.

I think this is a perfect use case for Blaze-Persistence Entity Views.

I created the library to allow easy mapping between JPA models and custom interface or abstract class defined models, something like Spring Data Projections on steroids. The idea is that you define your target structure(domain model) the way you like and map attributes(getters) via JPQL expressions to the entity model.

A DTO model for your use case could look like the following with Blaze-Persistence Entity-Views:

@EntityView(Company.class)
public interface CompanyDto {
    @IdMapping
    Long getId();
    String getName();
    Set<CountryDto> getTargetCountries();

    @EntityView(Country.class)
    interface CountryDto {
        @IdMapping
        Long getId();
        String getName();
    }
}

Querying is a matter of applying the entity view to a query, the simplest being just a query by id.

CompanyDto a = entityViewManager.find(entityManager, CompanyDto.class, id);

The Spring Data integration allows you to use it almost like Spring Data Projections: https://persistence.blazebit.com/documentation/entity-view/manual/en_US/index.html#spring-data-features

Page<CompanyDto> findAll(Pageable pageable);

The best part is, it will only fetch the state that is actually necessary!

Christian Beikov
  • 15,141
  • 2
  • 32
  • 58