How to make query faster with group by and order by

Question

I have large table with 615 million records in items and 10 thousand in markets

the below query takes 5+ seconds to execute

how can I make it faster. ?

SELECT items.name,items.ID as itemID,items.market_id as market_id,markets.name as marketname 
from items,markets 
where markets.ID = items.market_id and 
items.name like '%bab%' 
group by items.name 
order by items.name ASC 
limit 0,10

Explain

-> Limit: 10 row(s)  (actual time=4657.094..4657.096 rows=9 loops=1)

-> Sort: items.`name`, limit input to 10 row(s) per chunk  (actual time=4657.093..4657.094 rows=9 loops=1)

-> Table scan on <temporary>  (actual time=0.002..0.003 rows=9 loops=1)

-> Temporary table with deduplication  (cost=1141592.47 rows=625722) 
(actual time=4657.070..4657.072 rows=9 loops=1)

-> Nested loop inner join  (cost=1141592.47 rows=625722) (actual time=0.056..4541.471 rows=85329 loops=1)

-> Filter: (items.`name` like \'%bab%\')  (cost=593647.14 rows=625722) (actual time=0.043..4456.394 rows=85329 loops=1)

-> Table scan on items  (cost=593647.14 rows=5632061) (actual time=0.037..2864.754 rows=6153656 loops=1)

-> Single-row index lookup on markets using PRIMARY (ID=items.market_id)  (cost=0.78 rows=1) (actual time=0.001..0.001 rows=1 loops=85329)

Your query is likely invalid assuming that `items.name` is not the primary key. Also, you have a `LIKE` condition in your where clause which is not sargable. This will make it hard to optimize your query. And finally, I don't see the point of using `GROUP BY` here, since you select no aggregates. — Tim Biegeleisen, Apr 01 '21 at 17:04
From the picture (and from the comment Tim B. gave) it was already clear that `Table scan on items` is the cause of your problem. — Luuk, Apr 01 '21 at 17:17
See https://stackoverflow.com/questions/5905125/mysql-improve-search-performance-with-wildcards the reason for a full table scan is you are doing a "contains" that mysql cannot utilize the index when you have aleading wildcard. How long does `like 'bab%'` take (this is a starts with) ? — Rippo, Apr 01 '21 at 17:24
@Rippo will FTS be better solution ? change name column to text and do a match ? — skcrpk, Apr 01 '21 at 17:47
@skcrpk did you read the answer to the link I sent, any FTS using a leading wildcard is goinf to take a long time, even removing it still takes time on 165M rows. — Rippo, Apr 01 '21 at 19:14
@Rippo yes I did read that and tried to implement I was able to change the column to text and add a full text index it took long but result still took 2 seconds same as using ‘bab%’ as it utilises indexs in this case — skcrpk, Apr 01 '21 at 19:32
The query is poorly formed if there are multiple rows for a given `name`. The query would not even complete on 8.0. To phrase it another way, _which_ `market_id` and items.ID` do you want to display when you group by items.name? — Rick James, Apr 02 '21 at 03:10
@RickJames I will add tables schema in few hours so you can suggest a better approach ? — skcrpk, Apr 02 '21 at 03:14
`MIN(market_id)`? AVG(market_id) (I doubt it)? GROUP_CONCAT(market_id)? — Rick James, Apr 02 '21 at 03:16

Luuk · Answer 1 · 2021-04-01T18:42:13.763

You might get a (small !?) improvement, when you do this:

SELECT items.name,items.ID as itemID,items.market_id as market_id,markets.name as marketname 
from items
inner join markets on markets.ID = items.market_id
where markets.ID IN (SELECT items.market_id 
                     FROM marketd
                     WHERE items.name like '%bab%' 
                     order by items.name ASC 
                     limit 0,10)
order by items.name

EDIT: oops, that's tru you cannot do LIMIT in a subquery....

If you have MySQL 8.0+, you can do:

WITH cte AS (
   SELECT items.market_id 
   FROM marketd
   WHERE items.name like '%bab%' 
   order by items.name ASC 
   limit 0,10
)
SELECT items.name,items.ID as itemID,items.market_id as market_id,markets.name as marketname 
from items
inner join markets on markets.ID = items.market_id
where markets.ID IN ( SELECT market_id 
                      FROM cte )
order by items.name;

also, you might need to add an index on market_id, if that does not exists yet. With that, the cte will use that index.

getting , Error Code: 1235. This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery' — skcrpk, Apr 01 '21 at 17:35
edit done, because of the limit which cannot be used in a subquery — Luuk, Apr 01 '21 at 18:28
it still took too much time but most importantly the result is not same as the original query — skcrpk, Apr 01 '21 at 19:34

score 0 · Answer 2 · answered Apr 02 '21 at 01:00

0

This should work with any version of MySQL at least since 5.0:

SELECT items.name,
       items.ID as itemID,
       items.market_id as market_id,
       ( SELECT markets.name 
              FROM markets WHERE markets.ID = items.market_id
       ) as marketname 
    FROM items
    WHERE items.name like '%bab%' 
    ORDER BY items.name ASC 
    LIMIT 10

That should eliminate the GROUP BY.

Indexes:

items:  (name, ID, market_id)
markets:  presumably `ID` is the PRIMARY KEY

answered Apr 02 '21 at 01:00

Rick James

135,179
13
127
222

it returns same items.names but when I add group by to your query it returns different items.name , which is what I want , I have created suggested index but that is not being used , still query takes more than 4+ seconds with group by and without it takes 3+ seconds – skcrpk Apr 02 '21 at 01:18
@skcrpk - by "same items.names", you mean duplicates? – Rick James Apr 02 '21 at 02:57
Not duplicate each market can have same name items – skcrpk Apr 02 '21 at 02:58
Then which market_id should be displayed after you group by the name? – Rick James Apr 02 '21 at 03:11

How to make query faster with group by and order by

2 Answers2