5

I'm developing the chat feature in my app. The problem occurs when I try to retrieve a Chat object from my database, all the messages are retrieved duplicated twice.

Project structure

This is the simplified structure followed:

UML diagram

And this are the entities (simplified):

@Entity
public class Chat {
    @Id
    @GeneratedValue
    private Long id;

    @ManyToMany
    @JoinTable(
            name = "user_chat",
            joinColumns = @JoinColumn(name = "chat"),
            inverseJoinColumns = @JoinColumn(name = "user")
    )
    private Set<User> users = new HashSet<>();

    @OneToMany(mappedBy = "chat", cascade = CascadeType.ALL)
    private List<Message> messages = new ArrayList<>();


    public void addMessage(Message message){
        this.messages.add(message);
    }
}


@Entity
public class Message {
    private @Id @GeneratedValue Long id;

    @ManyToOne
    @JoinColumn(name="senderId", nullable=false)
    private User sender;

    @Column(name="value")
    private String value;

    @ManyToOne()
    @JoinColumn(name = "chat")
    private Chat chat;
}


@Entity
public class User implements UserDetails {
    private @Id
    @GeneratedValue
    Long id;

    @OneToMany(mappedBy = "sender", cascade = CascadeType.ALL)
    private List<Message> messagesSent;

    @ManyToMany(mappedBy = "users")
    private Set<Chat> chats = new HashSet<>();
}

And this is the repository which is failing:

public interface ChatRepository extends JpaRepository<Chat,Long> {
    @Query("SELECT c FROM Chat c JOIN c.users u WHERE u.id IN :usersIDs GROUP BY c HAVING COUNT(DISTINCT u) = :numberOfUsers")
    Optional<Chat> getChat(@Param("usersIDs") List<Long> usersIDs, int numberOfUsers);
}

The problem

Here you can see an example of how the object is retrieved:

Debugger

As you can see, every Message object is duplicated

What I suppose the problem is

Researching and executing the raw SQL query to the DB, I come across with the assumption that the problem can be this: Hibernate is making the query this way: Join between Chat and Users and then Join between the result and Messages. This gives messages duplicated because it joins each message with Chat-User1 and Chat-User2 rows. Just this way:

SELECT message.id as messageID, chat.id as chatID, user.id as userID FROM chat JOIN user JOIN message;

SQL response

What I have tried until here

In response of the assumption I explained before, I tried changing the repository query this way:

    public interface ChatRepository extends JpaRepository<Chat,Long> {
        @Query("SELECT c FROM Chat c JOIN c.users u LEFT JOIN c.messages m ON m.sender = u AND m.chat = c WHERE u.id IN :usersIDs GROUP BY c HAVING COUNT(DISTINCT u) = :numberOfUsers")
        Optional<Chat> getChat(@Param("usersIDs") List<Long> usersIDs, int numberOfUsers);
    }

Query breakdown:

SELECT c FROM Chat c JOIN c.users u Chat and Users joined

LEFT JOIN c.messages m ON m.sender = u AND m.chat = c Join between the result and messages

WHERE u.id IN :usersIDs Filtering just requested users

GROUP BY c HAVING COUNT(DISTINCT u) = :numberOfUsers") Retrieving the Chat which has all the users requested

As you can see, the difference lies in the ON m.sender = u AND m.chat = c. This way, the Messages must also join the User, like in this query:

    SELECT message.id as messageID, chat.id as chatID, user.id as userID FROM chat JOIN user JOIN message ON message.sender_id = user.id and message.chat = chat.id;

SQL response

and the LEFT before the JOIN is for getting the users who hasn't send message to the chat but are members.

However, this query is not solving the problem. Returns the Chat with the messages duplicated as well.

This is a list of other things I have tried:

  • Adding @Fetch (FetchMode.SELECT) before private List<Message> messages = new ArrayList<>();
  • Using @Transactional before the query
  • Overriding equals(Object o) in Message

This is how the message.equals() method looks like:

@Override
public boolean equals(Object o) {
    if (this == o) return true;
    if (o == null || getClass() != o.getClass()) return false;
    Message message = (Message) o;
    return Objects.equals(id, message.id) && Objects.equals(sender, message.sender) && Objects.equals(value, message.value) && Objects.equals(chat, message.chat);
}

@Override
public int hashCode() {
    return Objects.hash(id, sender, value, chat);
}

Other SO posts which I have read

JPA - EntityManager find method returns duplicates

Multiple fetches with EAGER type in Hibernate with JPA

Updates

In response on what @EdmarSM have told me in the comments, I have modified my query like this:

@Query("SELECT DISTINCT c FROM Chat c INNER JOIN FETCH c.users u LEFT JOIN FETCH c.messages m WHERE u.id IN :usersIDs GROUP BY c HAVING COUNT(DISTINCT u) = :numberOfUsers")

However, I've received the next error:

Expression #3 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'close.m1_0.id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

I can see that this error is produced because of the INNER JOIN FETCH part. Without it, there is no error.

I also tried using the query without that part, just adding the DISTINCT to the query I've proposed above.

@Query("SELECT DISTINCT c FROM Chat c JOIN c.users u LEFT JOIN c.messages m ON m.sender = u AND m.chat = c WHERE u.id IN :usersIDs GROUP BY c HAVING COUNT(DISTINCT u) = :numberOfUsers")

But the response was exactly the same, the Chat with duplicated messages.

Second update

Again, according with @EdmarSM has told me, I tried with this:

public interface ChatRepository extends JpaRepository<Chat,Long> {
    @Query("SELECT DISTINCT c FROM Chat c INNER JOIN FETCH c.users u LEFT JOIN FETCH c.messages m WHERE c.id IN (SELECT c2.id FROM Chat c2 INNER JOIN c2.users u2 WHERE u2.id IN :usersIDs GROUP BY c2 HAVING COUNT(DISTINCT u2) = :numberOfUsers)")
    Optional<Chat> getChat(@Param("usersIDs") List<Long> usersIDs, int numberOfUsers);
}

I thought it has solved my problem but it hasn't. It just changed the way the messages were duplicated. Before, it used to be this way:

AABBCCDDEE...

And with this implementation, the pattern is ABCDEABCDE

In other words, it use to be:

  • Message 1
  • Message 1
  • Message 2
  • Message 2
  • Message 3
  • Message 3
  • Message 4
  • Message 4

And now is:

  • Message 1
  • Message 2
  • Message 3
  • Message 4
  • Message 1
  • Message 2
  • Message 3
  • Message 4
Lucas Tomic
  • 62
  • 2
  • 14
  • What does `equals()` and `hashCode()` look like for `Message`? – XtremeBaumer Jul 21 '23 at 08:46
  • I've just added it to the post. Is the generated by IntelliJ – Lucas Tomic Jul 22 '23 at 09:08
  • Just use `Objects.hash(id, value);`. Do the same for user and chat. They shouldn't be part of `equals()` and `hashCode()` – XtremeBaumer Jul 25 '23 at 06:04
  • Done. But, It hasn't solved the problem – Lucas Tomic Aug 08 '23 at 07:41
  • Have you tried making Chat.messages and Chat.users lazily fetched? Eager in Hibernate is treated the same as a fetch join, and this is known to cause problems on *toMany relations in ways that might make your queries more difficult for Hibernate to piece together. Not all providers have this issue and know how to filter out duplicates from the collections. I believe it is why they so commonly use Set instead of List (a different solution). Alternatively, you can try @BatchSize on the relationship(s). Basically anything other than eager/join fetching. – Chris Aug 08 '23 at 14:16
  • `this is incompatible with sql_mode=only_full_group_by` suggests you should perhaps try another mode – crizzis Aug 27 '23 at 07:28

3 Answers3

4

When your query executes, your model will always return a cartesian plan composed of the 3 tables.

I believe that in this case what you should have to do is implement the DISTINCT clause into your projection, this way Hibernate would be able to compare the objects and add to the list of messages only one of each different message.

    public interface ChatRepository extends JpaRepository<Chat,Long> {
        @Query("SELECT DISTINCT c FROM Chat c INNER JOIN FETCH c.users u LEFT JOIN FETCH c.messages m WHERE c.id IN (SELECT c2.id FROM Chat c2 INNER JOIN c2.users u2 WHERE u2.id IN :usersIDs GROUP BY c2 HAVING COUNT(DISTINCT u2) = :numberOfUsers)")
        Optional<Chat> getChat(@Param("usersIDs") List<Long> usersIDs, int numberOfUsers);
    }

I have also added FETCH to your join to retrieve messages into a single shot, but that also can make the query bigger and add another instance of the user table to retrieve the message's user instead of reusing the same of the chat's user.

Besides distinct would have no effect on database query, Hibernate would know that the result must pass on a reduction process to eliminate duplicity.

Also, this article could be helpful: https://vladmihalcea.com/jpql-distinct-jpa-hibernate/

Notice that I've added the user count into a subquery, this way it is expected that you will not have an error in the group by clause.

EdmarSM
  • 151
  • 5
  • Hello, I'm sorry for the delay (I was on vacations). It returns me the next error: Expression #3 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'close.m1_0.id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by I will add it in details on the post, so you can see it better – Lucas Tomic Aug 08 '23 at 07:32
  • I think I have figured out the problem, I am going to change my answer. – EdmarSM Aug 08 '23 at 15:16
  • I've realizaed that this hasn't solved the problem. I thought It has, but I realized that the message are still duplicated. The difference is that they use to be duplicated this way: AABBCCDDEE..., and with this answer they are duplicated this way: ABCDEABCDE – Lucas Tomic Aug 24 '23 at 16:24
1

It is totally correct to receive duplicated messages, as you receive same chat for two different users. So first chat has its messages, second chat has its messages, because it is the same chat, messages are duplicated. Had you had received three users, you would have received triplicated messages.

The problem is that you should select separately users and messages having your relational diagram.

What you are trying to select is those chats that have number of users from a list of users. I would rather select just the distinct chat ids, then retrieve the list of messages by chat id separately. Otherwise you combine two separate relations chats-users and chat-messages and want them to match each other - impossible.

Change your query to:

"SELECT DISTINCT c.id FROM Chat c JOIN c.users u WHERE u.id IN :usersIDs GROUP BY c HAVING COUNT(DISTINCT u) = :numberOfUsers" 

then fetch chats by id, and from each chat get users or get messages, which will be two separate queries with no duplicates.

You can also try to change

@OneToMany(mappedBy = "chat", cascade = CascadeType.ALL)
    private List<Message> messages = new ArrayList<>();

to

@OneToMany(mappedBy = "chat", cascade = CascadeType.ALL)
    private Set<Message> messages = new HashSet<>();

This can help, have not tried. Though if it does not throw an exception it will do.

Mircea Sirghi
  • 310
  • 1
  • 6
  • Thank you for your answer. Sounds good, seems like I'm trying to "merge" two chats in one. I think I couldn't see it before because I don't fully understand the way JPA traduces JPQL to SQL, do you think that could be my problem? By the way, I can't use `Set` because I need the messages to be sorted. – Lucas Tomic Aug 27 '23 at 16:56
  • And one thing more. Using the SQL sentence ""SELECT c FROM Chat c WHERE c.id = :id" I get the chat with the messages duplicated 8 times. I don't understand that neither – Lucas Tomic Aug 27 '23 at 17:12
  • Interesting, I can imagine such case only if you select using previous query having 8 users, your result gets cashed, and you select the same id again which is returned from cash. Though, if possible, please share the generated query. – Mircea Sirghi Aug 27 '23 at 18:22
  • @LucasTomic I have tested your configuration, there are no duplication at all in either cases. I suspect you really have duplication in your DB. The problem seems to be when you insert a message you insert it for each user. If you don't mind I can add my project in github so you can have a look. – Mircea Sirghi Aug 27 '23 at 23:19
  • Yes, please add it. However, if I watch my database in MysqlWorkbench I see no duplication. Also, when I see the duplicity in my debugger the duplicated messages has the same ID (which is the primary key) so, they can't be duplicated in the databse – Lucas Tomic Aug 28 '23 at 09:16
  • Here you are the repo: https://github.com/hackus/TriangleRelations, test is failing because of Optional it should actually be List – Mircea Sirghi Aug 28 '23 at 16:55
  • I'm sorry, there was no way for me to run your project. I opened an Issue in your GitHub – Lucas Tomic Aug 29 '23 at 11:47
  • Fixed, please check. It was because I have used an older hibernate version trying to simulate your case. Also you may have to delete .idea folder and run invalidate cash. – Mircea Sirghi Aug 29 '23 at 12:36
  • I'been testing your repo and you're right. It works properly. So, the problem in my project must be in other part. – Lucas Tomic Aug 30 '23 at 07:14
  • Yes, you have some relations(foreign keys, tables) in your database that confuses hibernate to create wrong queries. – Mircea Sirghi Aug 30 '23 at 07:30
  • I can add an init SQL script and remove ddl-auto: update, so that you test your schema directly. – Mircea Sirghi Aug 30 '23 at 07:41
  • Here you are, the branch to better simulate database through init script https://github.com/hackus/TriangleRelations/tree/feature/addinitscript you can update https://github.com/hackus/TriangleRelations/blob/feature/addinitscript/src/main/resources/data.sql so that it matches your DB schema. – Mircea Sirghi Aug 30 '23 at 07:59
  • I think I've found the problem. I'm updating the post in a moment – Lucas Tomic Aug 30 '23 at 08:47
  • Please update, I am keen to understand where the issues is. – Mircea Sirghi Aug 30 '23 at 10:40
0

Here we go. Finally, I coulnd't find any clean solution. The better I could do was:

@Query("SELECT c FROM Chat c JOIN c.users u WHERE u.id IN :usersIDs GROUP BY c HAVING COUNT(DISTINCT u) = :numberOfUsers")
Optional<Chat> getChatWithDuplicatedMessages(@Param("usersIDs") List<Long> usersIDs, int numberOfUsers);

@Query("SELECT DISTINCT m FROM Chat c JOIN c.messages m WHERE c.id = :chatId")
List<Message> getMessagesByChat(Long chatId);

And override the messages of the Chat retrieved in the first method, with the messages retrieved in the second method.

Pretty dirty solution, I know. But is all I could manage to do. If someone knows how to face the problem in a cleaner way (modifying the SQL sentence to retrieve the Chat without duplicity), I will mark their solution as the correct one.

Lucas Tomic
  • 62
  • 2
  • 14