0

I have a really big select that is a little bit slow and I'd like some help to improve it.

select c.nome, p.foto, c.user, p.user, p.id, p.data, p.titulo, p.youtube, pp.foto, count(DISTINCT likes.user) as likes_count, count(distinct comentarios.id) as comentarios_count, count(DISTINCT l2.user) as count2

from posts p 

join cadastro c on p.user=c.id 
left join profile_picture pp on p.user = pp.user
left join likes on likes.post = p.id
left join comentarios on comentarios.foto = p.id and comentarios.delete = 0  
left join likes l2 on l2.post = p.id and l2.user = ?

where p.user=? and p.delete='0'
group by p.id
order by p.id limit ?

Where should I add indexes to speed my select? in all fields with on and where? like: p.user, c.id, pp.user, p.delete... isn't it too much?

RGS
  • 4,062
  • 4
  • 31
  • 67
  • Please read these [Tips for asking a good SQL question](http://meta.stackoverflow.com/a/271056/). Pay particular attention to the section on query performance. Then please [edit] your question to give us more information, so we can help you. – O. Jones May 04 '18 at 10:34
  • how many rows do you have in your tables? Do you plan to use this query only with `limit` or that is just first sample of pagination to get other portions with offset later? – Alex May 18 '18 at 23:55

2 Answers2

2

One good way to speed up this query is to refactor it to do a deferred join. The objective is to do your SELECT ... ORDER BY ... LIMIT... operation on the result set with the smallest possible number of columns. Why is this important? Ordering big result sets is more expensive than ordering small ones, especially when LIMIT discards most of the results of the ordering.

So, start with this subquery:

               SELECT p.id, c.id
                 FROM posts p
                 JOIN cadastro c ON p.user=c.id 
                WHERE p.user=? and p.delete='0'
                ORDER BY p.id
                LIMIT ?

There you have the relevant posts.id and cadastro.id values for your query. You can speed this up with a compound covering index on posts(user, delete): the query planner can satisfy this subquery completely from a scan of part of that compound index.

Then you join this into a version of your main query.

    SELECT c.nome, p.foto, c.user, p.user, p.id, p.data, p.titulo, 
           p.youtube, pp.foto, 
           count(DISTINCT likes.user) as likes_count,
           count(distinct comentarios.id) as comentarios_count,
           count(DISTINCT l2.user) as count2
    FROM (
                   SELECT p.id pid, c.id cid
                     FROM posts p
                     JOIN cadastro c ON p.user=c.id 
                    WHERE p.user=? and p.delete='0'
                    ORDER BY p.id, c.id
                    LIMIT ?
         ) selector
    JOIN posts p ON selector.pid = p.id
    JOIN cadastro c ON selector.cid =  p.user
    left join profile_picture pp on p.user = pp.user
    left join likes on likes.post = p.id
    left join comentarios on comentarios.foto = p.id and comentarios.delete = 0  
    left join likes l2 on l2.post = p.id and l2.user = ?
   where p.user=? and p.delete='0'
   group by p.id
   order by p.id limit ?

You need to redo the ORDER BY ... LIMIT ? operation because your left joins may increase the size of the final result set, and you need to limit it.

It's hard to tell what indexes will speed up the rest of the query without a lot more information about your tables. All those COUNT(DISTINCT...) operations are inescapably somewhat expensive. You might benefit from reading this: https://use-the-index-luke.com/

Pro tip You're using, and possibly misusing, a notorious extension to GROUP BY in MySQL. Your GROUP BY should say this, or the values c.nome and c.user may be chosen in an unpredictable way.

GROUP BY p.id, c.id

Pro tip Single-column indexes generally don't help queries or subqueries much: MySQL can only use one index per table in a query. So, covering indexes with the columns in the right order can help a lot. Don't just chuck in a bunch of indexes hoping to speed up queries.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
1

Add a composite index on post, in this order:

post:  INDEX(user, delete, id)
profile_picture:  (user, foto)
likes:  (post, user)
commentarios:  (foto, delete, id)

If I understand 'post' and cadastro (registry), the will be a cadastro entry for each post? Therefore, there is no need to include cadastro in the derived table.

Also, I assume there is at most one foto per person. (Otherwise the GROUP BY is in trouble, and O.Jones' won't get the right answer.) There is a fix if there can be more than one, but you want to show only one. (Use MAX.)

I am using subqueries in the SELECT clause to avoid the explode-implode of JOIN...GROUP BY.

I am unclear on the intent of l2.user = ?, but I left it alone.

SELECT  c.nome, p.foto, c.user, p.user, p.id, p.data, p.titulo,
        p.youtube,
        ( SELECT MAX(foto) FROM profile_picture
                           WHERE p.user = user ) AS foto,
        ( SELECT count(DISTINCT user) FROM likes
                           WHERE post = p.id ) as likes_count,
        ( SELECT count(distinct id) FROM comentarios
                           WHERE foto = p.id
                             AND delete  = 0 ) as comentarios_count,
        ( SELECT count(DISTINCT user) FROM likes
                           WHERE post = p.id
                             AND user = ? ) as count2
    FROM  
    (
        SELECT  p.id pid
            FROM  posts p
            WHERE  p.user=?
              and  p.delete='0'
            ORDER BY  p.id
            LIMIT  ? 
    ) selector
    JOIN  posts p  ON selector.pid = p.id
    JOIN  cadastro c  ON p.user = c.id
    ORDER BY  p.id
Rick James
  • 135,179
  • 13
  • 127
  • 222