What is the best practice to create repository on Spring Boot?

Question

I want to create one to many mapping like Post has many Comments. I have two solutions for adding comments. The first solution is to create a repository for the comment and the second solution is to use PostRepository and get post and add comment to the post. Each solution has its own challenges.

In the first solution, creating repositories per entity increases the number of repositories too much and based on the DDD, repositories should be created for Aggregate Roots.

In the second solution, there are performance issues. To load, add or remove nested entities, the root entity must be loaded first. To add entity, other related entities like User Entity in Comment Entity must be loaded from userRepository. As a result, these additional loadings cause a decrease in speed and total performance.

What is the best practice to load, add or remove nested entities?

File Post.java

@Entity
@Table(name = "posts")
@Getter
@Setter
public class Post
{
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    @Size(max = 250)
    private String description;

    @NotNull
    @Lob
    private String content;

    @OneToMany(mappedBy = "post", fetch = FetchType.LAZY, cascade = CascadeType.ALL)
    private Set<Comment> comments = new HashSet<>();

    @ManyToOne(fetch = FetchType.LAZY, optional = false)
    @JoinColumn(name = "user_id", nullable = false)
    @OnDelete(action = OnDeleteAction.CASCADE)
    private User user;
}

File Comment.java

@Entity
@Table(name = "comments")
@Getter
@Setter
public class Comment {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    @NotNull
    @Lob
    private String text;

    @ManyToOne(fetch = FetchType.LAZY, optional = false)
    @JoinColumn(name = "post_id", nullable = false)
    @OnDelete(action = OnDeleteAction.CASCADE)
    private Post post;

    @ManyToOne(fetch = FetchType.LAZY, optional = false)
    @JoinColumn(name = "user_id", nullable = false)
    @OnDelete(action = OnDeleteAction.CASCADE)
    private User user;
}

@Entity
@Table(name = "Users")
@Getter
@Setter
public class User
{   
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id; 

    @OneToMany(mappedBy = "user", fetch = FetchType.LAZY, cascade = CascadeType.ALL)
    private Set<Comment> comments = new HashSet<>();

    @OneToMany(mappedBy = "user", fetch = FetchType.LAZY, cascade = CascadeType.ALL)
    private Set<Post> posts = new HashSet<>();
}

The main problem is the opinion that there is `@OneToMany` relationship between posts and comments is wrong. Yes, they are somehow related, however in JPA terms `@OneToMany` is actually `@OneToFew` like "person may be reachable by a couple of phone numbers", assuming the same about posts and comments is incorrect. — Andrey B. Panfilov, Sep 06 '22 at 06:18
You can check unidirectional mappings to avoid unnecessary queries. Currently you are using bidirectional approach. If you have really big data about comments and posts which contains comments, you can create repository to fetch and use it in proper way. If you don't have really big data, you can simply use bidirectional approach for mappings and simply you can call getter to fetch data from db. — Gurkan İlleez, Sep 06 '22 at 07:07

score 2 · Answer 1 · answered Sep 06 '22 at 07:40

"best" is not well defined. But here is what is probably to be considered the canonic stance the Spring Data Team has on this question.

You definitely should NOT have one repository per entity (s. Are you supposed to have one repository per table in JPA?).

The reason is certainly not that you'd have to many classes/interfaces. Classes and interfaces are really cheap to create both at implementation time and at run time. It is kind of hard to have so many of them that it poses a significant problem. And if it would, already the entities would cause a problem.

The reason is that repositories handle aggregates, not entities. Although, admittedly the difference is hard to see in JPA based code. So your question boils down to: What should be an aggregate.

At least part of the answer is already in your question:

In the second solution, there are performance issues. To load, add or remove nested entities, the root entity must be loaded first. To add entity, other related entities like User Entity in Comment Entity must be loaded from userRepository. As a result, these additional loadings cause a decrease in speed and total performance.

The concepts of aggregate and repository are widely adopted in the microservice community because they lead to good scalability. This certainly isn't the same as "speed and total performance" but certainly related.

So how go these two view together? Andrey B. Panfilov is onto something with their comment:

@OneToMany is actually @OneToFew like "person may be reachable by a couple of phone numbers". But it only describes a heuristic.

The real rule is: An aggregate should group classes that need to be consistent at all times. The canonical example is a purchase order with its line items. Line items on their own don't make sense. And if you modify a line item (or add/remove one) you might have to update the purchase order, for example in order to update the total price or in order to maintain constraints like a maximum value. So purchase order should be an aggregate including its line items.

This also means that you need to completely load an aggregate. This in turn means that it can't be to big, because otherwise you'd run into performance problems.

In your example of Post, Comment, and User, Post might form an aggregate with Comment. But in most systems the number of comments is close to unlimited and can be huge. I therefore would vote for making each entity in your example its own aggregate.

For more input about aggregates and repositories you might find Spring Data JDBC, References, and Aggregates interesting. It is about Spring Data JDBC not Spring Data JPA, but the conceptual ideas do apply.

Jens, actually the purchase order analogy does not shed enough light on the problem (as well as phone numbers): at first glance we may consider order items as `@ElementCollection`, however, if items are being processed by different providers than those items are already independent entities. IMO, when designing domain model we need to rely on business activities, rather than on software capabilities. — Andrey B. Panfilov, Sep 06 '22 at 07:56

Gurkan İlleez · Answer 2 · 2022-09-06T07:41:18.133

N+1 problem: fetch data in loop and If you have 2000+ data for posts and comments, you need to avoid to fetch for each data.

// Ex: 2000 posts is fetched
for(Post post: userRepository.findById("1").getPosts()) {
   // fetching in loop: you go to database for each post(2000) and get comments of posts.
   Set<Comment> comments = post.getComments();
}

Solution: create a repository for Post and fetch with custom repository. There are a lot of way to fetch eagerly. Ex: EntityGraph, FetchType.EAGER, JPQL ...

@Query(value = "select p from Post p fetch left join p.comments c where p.id=:postId)
public Set<Post> postsWithComments(@Param("postId") Long postId)

Set<Post> posts = postRepository.postWithComments(1L);

Even you need to be careful when fetching data eagerly, If there are a lot of comments for post simply use another repository for Comment.

public Set<Comment> findByPostId(String postId);
Set<Comment> comments = commentRepository.findByPostId(1L);

Even if there are 60000 comments for a single post. you need to fetch with pagination which can be helpful in critical times.

public Page<Comment> findByPostId(Long postId, Pageable pageable);
Page<Comment> comments = commentRepository.findByPostId(1L, PageRequest.of(2000));
int loopCounter = comments.getTotalElements() % 2000 == 0 ? comments.getTotalElements() / 2000 : comments.getTotalElements() / 2000 + 1;
int i=1;
do{

   // do something
   i++;
}while(i <= loopCounter);

For further things you need to use cache strategies for improving performance.

Also you need to define what can be the response time of request and what is actual response time. You can use fetch with left join or simply another request. In the long running processes you can use async operations as well.

What is the best practice to create repository on Spring Boot?

2 Answers2