0

I have 2 tables:

chats (id, ..., chat_status_id) // (about 28k records)
chat_messages(id, chat_id, send_date, ...) // (about 1 million records)

I need to get chats of certain status with latest message.

This is the select I am using, but it's pretty slow (it works in the end):

SELECT c.*,
       p1.*
FROM   chats c
       JOIN chat_messages p1
         ON ( c.id = p1.chat_id )
       LEFT OUTER JOIN chat_messages p2
                    ON ( c.id = p2.chat_id
                         AND ( p1.send_date < p2.send_date
                                OR ( p1.send_date = p2.send_date
                                     AND p1.id < p2.id ) ) )
WHERE  p2.id IS NULL
AND c.chat_status_id = 1
ORDER  BY p1.send_date DESC 

I do not know howto optimize it.

Rick James
  • 135,179
  • 13
  • 127
  • 222
Giacomo M
  • 4,450
  • 7
  • 28
  • 57
  • An additional _historical_ archive table. A search engine with more filtering. A cache of more popular messages (when there is a usage count). Also check the **explain** command. Also you do not seem to use JPA (like eclipseLink) with java entities. Which would cache java entities. (Though that is not necessarily an advise to use JPA.) – Joop Eggen Aug 02 '22 at 10:41
  • @JoopEggen I will chek all of your advices, except java (this is a php laravel project). thanks – Giacomo M Aug 02 '22 at 10:46
  • Does this answer your question? [How to Optimize Queries in a Database - The Basics](https://stackoverflow.com/questions/3191623/how-to-optimize-queries-in-a-database-the-basics) – philipxy Aug 02 '22 at 10:59
  • 1
    Which table is `chat_status_id` in? (Please qualify it. And provide `SHOW CREATE TABLE`) – Rick James Aug 02 '22 at 18:06
  • [What are your most common sql optimizations?](https://stackoverflow.com/q/1332778/3404097) [When and why are database joins expensive?](https://stackoverflow.com/q/173726/3404097) [Tips for asking a good SQL question](https://meta.stackoverflow.com/q/271055/3404097) [Asking query performance questions](https://dba.meta.stackexchange.com/q/3034/43932) [Re SQL "performance".](https://stackoverflow.com/a/24196511/3404097) – philipxy Aug 02 '22 at 23:17

2 Answers2

1

I would start with a few index updates. First, your WHERE clause is based on the CHATS table status, that should be first in your index order, but can have the ID as well AFTER to support the joins. Next, your messages table. Similarly, those are JOINed based on the CHAT ID, not its own unique table ID as the primary consideration, but does use the ID as the carry-along for conditional testing of p1.id < p2.id. Suggestions as follows.

Table          Index
Chats          ( Chat_Status_Id, id )
Chat_Messages  ( Chat_id, id, send_date )
DRapp
  • 47,638
  • 12
  • 72
  • 142
  • I suggest a change to `Chat_Messages: (chat_id, send_date, id)` – Rick James Aug 02 '22 at 18:10
  • @RickJames, I might agree, but if their data is such that the IDs are always in sequence relative to the send_date (via auto-increment, and not across multiple databases where that might not be the case), their query could be simplified with p1.ID < p2.id without concern of the date. But agree send_date/id could be switched. – DRapp Aug 02 '22 at 22:14
0

Give this a try:

SELECT  c.*, p1.*
    FROM  chats c
    JOIN  chat_messages p1  ON ( c.id = p1.chat_id )
    WHERE  NOT EXISTS 
    (
        SELECT  1
            FROM  chat_messages p2
            WHERE  c.id = p2.chat_id
              AND  ( p1.send_date < p2.send_date
                      OR  ( p1.send_date = p2.send_date
                              AND  p1.id < p2.id ) ) 
    )
    WHERE  c.chat_status_id = 1
    ORDER BY  p1.send_date DESC 

With

chats:  INDEX(chat_status_id, id)
chat_messages:  INDEX(chat_id, send_date,  id)
Rick James
  • 135,179
  • 13
  • 127
  • 222