1
SELECT m.*, pc.call_date                     
                    FROM messages m
                    LEFT JOIN customers c ON m.device_user_id = c.device_user_id
                    LEFT JOIN phone_call pc ON pc.id = (
                        SELECT MAX(pc2.id)
                        FROM phone_call pc2
                        WHERE pc2.device_user_id = c.device_user_id OR pc2.customer_id = c.customer_id
                    )

The issue with the above is the left join phone_call table to find out the latest phone call made for each record. phone_call table has GBs of data. With left join phone_call, it takes more than 30 secs to return data. without it less than a sec. so that table is the issue. is there a better way to achieve the same result as the above query?

M Khalid Junaid
  • 63,861
  • 10
  • 90
  • 118
SO-user
  • 1,458
  • 2
  • 21
  • 43
  • 1
    Code questions require a [mre]--including cut & paste & runnable minimal code & minimal representative data given as code. For SQL that includes DBMS & DDL, including constraints, indexes & tabular initialization. For SQL performance that includes EXPLAIN results & statistics. Please research & summarize. For SQL that includes basics of optimization/performance--immediately leading to indexes, plans, statistics & SARGability. [Tips for asking a good SQL question](https://meta.stackoverflow.com/a/271056/3404097) Ask re optimization after you have learned & applied those basics. [ask] – philipxy Jun 04 '20 at 16:58

4 Answers4

3

The way you phrased the query looks good to me for MySQL 5.7. But the OR in the subquery is a performance killer.

I would recommend the following index, so the correlated subquery executed quickly:

phone_call(device_user_id, customer_id, id) 

You might try and switch the first two columns in the index to see if one version or another has better effect.

Another thing you could try is change the subquery to use a sort and a row limiting clause rather than aggregation (with the same above index in place). There is guarantee it will improve things, but it is worth trying:

LEFT JOIN phone_call pc ON pc.id = (
    SELECT pc2.id
    FROM phone_call pc2
    WHERE 
        pc2.device_user_id = c.device_user_id 
        OR pc2.customer_id = c.customer_id
    ORDER BY pc2.id
    LIMIT 1
)

Finally, another idea is to split the subquery into two to avoid the OR:

LEFT JOIN phone_call pc ON pc.id = (
    SELECT MAX(id)
    FROM (
        SELECT MAX(pc2.id)
        FROM phone_call pc2
        WHERE pc2.device_user_id = c.device_user_id 
        UNION ALL
        SELECT MAX(pc3.id)
        FROM phone_call pc3
        WHERE pc3.customer_id = c.customer_id
    ) t
)

Or without intermediate aggregation:

LEFT JOIN phone_call pc ON pc.id = (
    SELECT MAX(id)
    FROM (
        SELECT pc2.id
        FROM phone_call pc2
        WHERE pc2.device_user_id = c.device_user_id 
        UNION ALL
        SELECT pc3.id
        FROM phone_call pc3
        WHERE pc3.customer_id = c.customer_id
    ) t
)

For the last two queries, you would need two indexes:

phone_call(device_user_id, id)
phone_call(customer_id, id)

EDIT

The above solutions using union all require MySQL 8.0 - in earlier versions, they fail because the subqueries are too deeply nested to reference columns from the outer query. So, another alternative is IN:

LEFT JOIN phone_call pc ON pc.id IN (
    SELECT pc2.id
    FROM phone_call pc2
    WHERE pc2.device_user_id = c.device_user_id 
    UNION ALL
    SELECT pc3.id
    FROM phone_call pc3
    WHERE pc3.customer_id = c.customer_id
)

This can also be phase with EXISTS - which I like better, because the predicates explicitly match the indexes definition, so it should be an easy decision for MySQL to use them:

LEFT JOIN phone_call pc ON EXISTS (
    SELECT 1
    FROM phone_call pc2
    WHERE pc2.device_user_id = c.device_user_id AND pc2.id = pc.id
    UNION ALL
    SELECT 1
    FROM phone_call pc3
    WHERE pc3.customer_id = c.customer_id AND pc3.id = pc.id
)

Again, this works under the assumption that you have the two following, multicolumn indexes:

phone_call(device_user_id, id)
phone_call(customer_id, id)

You can create the indexes as follow:

create index idx_phone_call_device_user on phone_call(device_user_id, id);
create index idx_phone_call_customer    on phone_call(customer_id, id);
GMB
  • 216,147
  • 25
  • 84
  • 135
  • @SO-user: do you have a *compound* index on the three columns, or an index on each column? – GMB May 31 '20 at 16:31
  • index on each column. and I've tried that order by id desc query already. no diff. I'll try your other two queries. those are some nice ideas. – SO-user May 31 '20 at 16:40
  • @SO-user: I meant a multi column index. An index on each column won’t help. – GMB May 31 '20 at 16:42
  • Okay. I'll look into that and get back. Have not tried it before. – SO-user May 31 '20 at 16:43
  • in the last two queries, c.device_user_id is not recognised since we use it inside a subquery of another. – SO-user May 31 '20 at 18:29
  • @SO-user: it seems this syntax only works on MySQL 8.0, not in earlier versions. I edited my answer with another solution for earlier versions. Please check. – GMB Jun 04 '20 at 21:08
  • in the latest query of yours, we have to choose the lastest id after the union. @GMB – SO-user Jun 10 '20 at 09:39
1

The MAX subquery can't use an index due to the OR condition. Split this subquery into two - one for each condition - and take the highest result using GREATEST():

SELECT m.*, pc.call_date                     
FROM messages m
LEFT JOIN customers c ON m.device_user_id = c.device_user_id
LEFT JOIN phone_call pc ON pc.id = GREATEST((
  SELECT MAX(pc2.id)
  FROM phone_call pc2
  WHERE pc2.device_user_id = c.device_user_id
), (
  SELECT MAX(pc2.id)
  FROM phone_call pc2
  WHERE pc2.customer_id = c.customer_id
))

Each subquery will need it's own index - which are

phone_call(device_user_id, id)
phone_call(customer_id, id)

If phone_call.id is the primary key and the table is using InnoDB, then you can omnit it from the index, since it will be appended implicitly.

Since one of the subqueries might return NULL you should use COALESCE() with a number smaller than any existing id. If id is AUTO_INCREMENT then 0 should be fine:

SELECT m.*, pc.call_date                     
FROM messages m
LEFT JOIN customers c ON m.device_user_id = c.device_user_id
LEFT JOIN phone_call pc ON pc.id = GREATEST(
  COALESCE((
    SELECT MAX(pc2.id)
    FROM phone_call pc2
    WHERE pc2.device_user_id = c.device_user_id
  ), 0), 
  COALESCE((
    SELECT MAX(pc2.id)
    FROM phone_call pc2
    WHERE pc2.customer_id = c.customer_id
  ), 0)
)
Paul Spiegel
  • 30,925
  • 5
  • 44
  • 53
0

Well, you will probably not like this answer, but, if this is going to be an important data and a frequent query, I would put last_call_date as a field in customer's table.

Pablo Santa Cruz
  • 176,835
  • 32
  • 241
  • 292
0

I believe your question is related to problem there are several ways to get the latest record as per your grouping criteria. One of them is using self joins and you could rewrite your query as

SELECT  m.*,
        pc.call_date                     
FROM messages m
LEFT JOIN customers c ON m.device_user_id = c.device_user_id
LEFT JOIN phone_call pc ON pc.device_user_id = c.device_user_id OR pc.customer_id = c.customer_id
LEFT JOIN phone_call pc2 ON (
    (pc.device_user_id = pc2.device_user_id OR pc.customer_id = pc2.customer_id) AND pc1.call_date < pc2.call_date
)
WHERE pc2.call_date IS NULL

In above query where clause is important to filter out rows with older date, You will also need to add a composite index on phone_call table

CREATE INDEX index_name ON phone_call(device_user_id,customer_id,call_date);

The query optimizer cannot use the index to perform lookups if the columns do not form a leftmost prefix of the index.

Moreover please execute EXPLAIN PLAN for your query to see performance related issues and to make sure that right indexes are being used.

Retrieving the last record in each group - MySQL

M Khalid Junaid
  • 63,861
  • 10
  • 90
  • 118