0

I have two tables, Listings and Products. The Listings table has approx 1.2M records and the product table has approx 600K records. When I run the following query on the Listings table, the results return in 56.6 ms.

SELECT    l.field_name
FROM      Listing l
WHERE     l.deletedAt IS NULL
GROUP BY  l.field_name HAVING COUNT(l.field_name) > 1

When I add in a LEFT JOIN, it takes nearly 50 seconds to return the results..

SELECT    l.field_name, p.name
FROM      Listing l
LEFT JOIN Product p
 ON       l.product_id = p.id
WHERE     l.deletedAt IS NULL
GROUP BY  l.field_name HAVING COUNT(l.field_name) > 1
ORDER BY  l.field_name

In the Listings table there may be duplicates of the field_name, thus the HAVING statement.

How can I optimize this query to return results quicker. Thanks

Explain output

+----+-------------+-------+--------+---------------+---------+---------+---------------------+---------+----------------------------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                 | rows    | Extra                                        |
+----+-------------+-------+--------+---------------+---------+---------+---------------------+---------+----------------------------------------------+
|  1 | SIMPLE      | l     | ALL    | field_name    | NULL    | NULL    | NULL                | 1022146 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | p     | eq_ref | PRIMARY       | PRIMARY | 4       | dbName.l.product_id | 1       | Using index"                                 |
+----+-------------+-------+--------+---------------+---------+---------+---------------------+---------+----------------------------------------------+
miken32
  • 42,008
  • 16
  • 111
  • 154
LargeTuna
  • 2,694
  • 7
  • 46
  • 92
  • 1
    Usually the first step is to check indexes. Also, you might have better results using a sub-query instead of doing a join. Typically you shouldn't use a join to limit your query. That belongs in the where clause if you aren't selecting anything from the second table. – Jonathan Kuhn Nov 13 '15 at 20:16
  • 2
    What is the purpose of the left join in that query? Nothing from the `Product` table is used. – samlev Nov 13 '15 at 20:22
  • @samlev but it would create duplicate rows where duplicate values in product exist. Or where duplicate values exist in listing. – Jonathan Kuhn Nov 13 '15 at 20:25
  • My bad, it has been updated now. – LargeTuna Nov 13 '15 at 20:26
  • You are grouping and ordering on a field you don't select? Not that there is anything intrinsically wrong with that; but it doesn't seem to serve a purpose in this case. What exactly is the question the query is trying to answer? The HAVING makes it seem you are looking for "products listed (as a particular listing field name) more than once". – Uueerdo Nov 13 '15 at 20:36
  • Post the results of EXPLAIN on your slow query. – Sasha Pachev Nov 13 '15 at 20:42
  • Can `listing.product_id` have `null` values? – trincot Nov 13 '15 at 21:02
  • No. that field cannot be null. – LargeTuna Nov 13 '15 at 21:03
  • @LargeTuna query still may be a little confused. As the query currently is shown, it will return all field_name's that have appeared in more than one undeleted listing record along with one (effectively random) product.name that had been associated with that field_name. Again, what exactly is the question the query is trying to answer? – Uueerdo Nov 14 '15 at 01:01

1 Answers1

0

The speed here is fundamentally based on how many records are on each side of the JOIN.

It looks like you're searching for records that were marked as "not deleted", and "has field_name". There's also a GROUP BY that will slow things down if you try to do it after a JOIN, and it looks like you could perform the GROUP BY before joining, too.

In that case, you should first query for all that "filtering" exclusively, and put them in the right order to begin with, and join that smaller set of records against the other table, again asking for sorted results.

Each DBMS has a slightly different SQL query to achieve that. In advanced DBMS, you could make use of a WITH ... AS ... SELECT ... ("with" clause) syntax.

With MySQL, you'd have to come up with an alternative way to express the same thing. A few samples of how to do that are here:

A pseudo code for MySQL should be kind of like this:

SELECT
  a.name, b.description
FROM
  (subset of "a" table, order by id [group by could be here, too])
  JOIN
  (subset of "b" table, order by id)

Please refer to the linked answers to see more complete code samples.

Community
  • 1
  • 1
starlocke
  • 3,407
  • 2
  • 25
  • 38