How to use indexing for multiple table join in a single query?

Question

Query with multiple table joins is taking too much time. How to do the indexing for the query below:

QUERY:

select  ri.id, LOWER(REPLACE(ri.name,' ','-')) as urlName,ri.name,
ri.logo,
group_concat(distinct rc.cuisine order by  rc.cuisine asc SEPARATOR ',
        '
            ) as 'cuisine_string', rc.cuisine, rai.rating, rai.min_order_amount,
        rai.latitude, rai.longitude, rai.delivery_time, rdf.start_dist,
rdf.end_dist, rdf.fee, ri.address_search, rai.delivery_facility,
ri.status as 'restaurant_status', rt.day, rt.status 'opening_status',
rt.opening_time, rt.closing_time,
' - ',rdf.end_dist,
' km',' : ','₹',fee) SEPARATOR '~') as 'delivery_fee_string',
GROUP_CONCAT(distinct CONCAT(rdf.start_dist, ( SELECT  MATCH (ri.address_search) AGAINST ('Kahilipara,
                Guwahati, Assam, India') as relevance
            from  restaurant_info ri
            where  ri.id = rai.restaurant_id
              and  ri.id = rt.restaurant_id
              and  ri.id = rdf.restaurant_id
              and  ri.id = rc.restaurant_id) as ord , ( 3959 * acos ( cos ( radians(26.1428694) ) * cos( radians( rai.latitude ) ) * cos( radians( rai.longitude ) - radians(91.768487) ) + sin ( radians(26.1428694) ) * sin( radians( rai.latitude ) ) ) ) AS distance
    from  restaurant_info ri
    inner join  restaurant_additional_info rai  ON ri.id = rai.restaurant_id
    inner join  restaurant_timing rt  ON ri.id = rt.restaurant_id
    inner join  restaurant_delivery_fee rdf  ON ri.id = rdf.restaurant_id
    inner join  restaurant_cuisine rc  ON ri.id = rc.restaurant_id
    where  ri.status = 1
      and  rt.status = 1
      and  rt.day = lower(DATE_FORMAT(NOW(),'%a'))
      and  rai.delivery_facility != 1
      and  rai.min_order_amount <= 100
      and  rai.rating <= ''
      and  MATCH (ri.address_search) AGAINST ('Kahilipara, Guwahati,
                Assam, India'
                 )
      and  rt.opening_time < '12:40:21'
      and  rt.closing_time > '12:40:21'
    group by  ri.id
    having  rdf.start_dist = 0
      and  distance < 3.10686
    order by  distance asc
    LIMIT  100 OFFSET 0

DESCRIPTION OF PROBLEM- This Query is taking 8.5 seconds to run. My application contains 1-2 such queries.Therefore loading time approaches 1 min on server.

Can anyone help me to apply indexing on this query?

Look into MySQL's geospatial support if you really need performance here. — Tim Biegeleisen, May 04 '19 at 08:59
Use EXPLAIN on the query... but I guess it's the calculation that consumes the time. How many reocrds are there? — Honk der Hase, May 04 '19 at 09:03
2000 plus records are there and I am using join for 3 tables @LarsStegelitz — Debu, May 04 '19 at 09:05
@TimBiegeleisen This question is not a duplicate of anyone. I am fetching based on address inputted on form then matching it with my tables against a distance. — Debu, May 04 '19 at 09:10
Please try one or more of the indexing techniques in the duplicate link. — Tim Biegeleisen, May 04 '19 at 09:11
The duplicate link only on the getting data according to the distance and not indexes which I don't need. My query is already giving me the desired result but I need to look into the performance of the query so I am asking your help on implementing indexes in this particular query. @TimBiegeleisen — Debu, May 04 '19 at 09:16
I reopened because https://stackoverflow.com/questions/574691/mysql-great-circle-distance-haversine-formula did not cover all the issues being raised. — Rick James, May 05 '19 at 23:55

score 0 · Answer 1 · answered May 06 '19 at 00:16

It would help to have SHOW CREATE TABLE for the various tables. Also, EXPLAIN SELECT ....

where  ri.status = 1
  and  rt.status = 1
  and  rt.day = lower(DATE_FORMAT(NOW(),'%a'))
  and  rai.delivery_facility != 1
  and  rai.min_order_amount <= 100
  and  rai.rating <= ''
  and  MATCH (ri.address_search) AGAINST ('Kahilipara, Guwahati, Assam, India')
  and  rt.opening_time < '12:40:21'
  and  rt.closing_time > '12:40:21'
group by  ri.id
having  rdf.start_dist = 0
  and  distance < 3.10686
order by  distance asc
LIMIT  100 OFFSET 0

Since there is a FULLTEXT condition in the WHERE clause, it will probably do that first. However, if there are thousands of rows in those areas, we need to dig further.

rai.rating <= '' seems very strange; what are you expect from "<=''"?

having rdf.start_dist = 0 -- it seems like this belongs in the WHERE clause, not HAVING.

One big problem is you have range tests scattered across multiple tables:

rt -- just a certain day of the week
rai -- limited amount, etc
ri -- certain regions
rt -- time range, complicated by using 2 columns

Even if these fields were in the same table, it would still be costly to compute.

distance < ... -- This could be optimized with a "bounding box", as discussed in the link, but it won't help much because of the other issues.

JOIN ... GROUP BY id -- This is probably an "explode-implode" scenario. What happens is the JOINs expand the number of 'rows' being looked at by joining 1:many, etc. Then the GROUP BY is trying to get back to just one row per id. If it weren't for all the filtering, there are workarounds.

Meanwhile, the GROUP BY may be improperly used. See ONLY_FULL_GROUP_BY.

order by distance LIMIT .. -- The query is forced to gather all the rows (thousands?) before sorting and delivering up to 100.

rt.day = lower(DATE_FORMAT(NOW(),'%a')) - with a suitable collation, you don't need LOWER().

MATCH (ri.address_search) .. and distance < .. seem redundant.

Why are there 5 tables, each keyed by "restaurant id"? (There could be valid reasons, but it looks fishy.) Are they 1:1 or 1:many?

They are 1:1 relationship. That is why we have restaurant id keyed in each of the tables. @Rick James — Debu, May 06 '19 at 08:29

How to use indexing for multiple table join in a single query?

1 Answers1