0

I have following query which is used to autocomplete school names that user is searching. If $query = har

SELECT *, 
       CASE 
         WHEN text LIKE '$query' THEN 1 
         WHEN text LIKE '$query%' THEN 2 
         WHEN text LIKE '%$query%' THEN 3 
       end AS priority 
FROM   (SELECT b.school_name AS `text`, 
               'school'      AS `type`, 
               b.slug          AS `id`, 
               n.neighbourhood AS 'params' 
        FROM   schools b 
               LEFT JOIN school_addresses ba 
                      ON ( b.id = ba.school_id ) 
               LEFT JOIN neighbourhoods n 
                      ON ( ba.neighbourhood_id = n.id ) 
        WHERE  b.entity_status = 'active' 
               AND ba.city_id = '$city' 
               AND b.visibility != 'delisted' 
               AND (b.school_name LIKE '$query' 
                    OR b.school_name LIKE '$query%' 
                    OR b.school_name LIKE '%$query%') 
        UNION 
        SELECT tg.OPTION AS `text`, 
               'tags'    AS `type`, 
               tg.OPTION AS `id`, 
               tg.OPTION AS 'params' 
        FROM   tags t 
               LEFT JOIN tag_options tg 
                      ON ( t.id = tg.tag_id ) 
        WHERE  t.tag = 'Cuisines' 
               AND (tg.OPTION LIKE '$query' 
                    OR tg.OPTION LIKE '$query%' 
                    OR tg.OPTION LIKE '%$query%') 
        UNION 
        SELECT category   AS `text`, 
               'category' AS `type`, 
               category   AS `id`, 
               category   AS 'params' 
        FROM   categories 
        WHERE  category LIKE '$query' 
               OR category LIKE '$query%' 
               OR category LIKE '%$query%' 
        UNION 
        SELECT area   AS `text`, 
               'area' AS `type`, 
               id     AS `id`, 
               id     AS 'params' 
        FROM   areas 
        WHERE  city_id = '$city' 
               AND (area LIKE '$query' 
                    OR area LIKE '$query%' 
                    OR area LIKE '%$query%') 
        UNION 
        SELECT district    AS `text`, 
               'districts' AS `type`, 
               id          AS `id`, 
               id          AS 'params' 
        FROM   districts 
        WHERE  city_id = '$city' 
               AND (district LIKE '$query' 
                    OR district LIKE '$query%' 
                    OR district LIKE '%$query%') 
        UNION 
        SELECT neighbourhood   AS `text`, 
               'neighbourhood' AS `type`, 
               id              AS `id`, 
               id              AS 'params' 
        FROM   neighbourhoods 
        WHERE  city_id = '$city'
               AND (neighbourhood LIKE '$query' 
                    OR neighbourhood LIKE '$query%' 
                    OR neighbourhood LIKE '%$query%') 
        ) AS t1 
WHERE 1 
ORDER  BY priority
LIMIT  5

This is the result it yields

'text'      'type'     'id'           'params'     'priority'
Harvard     mba     harv-ny-city     new york       2
Harcum      mba     har-pa           Pa             2
Harford     mba     harf-md          Maryland       2

My question is how can i search using both "Name of the school" 'text' in above query and "Place of the school" 'params' in above query. Like if $query = 'harford ma' Then it should yield results like this:

   'text'      'type'     'id'           'params'     'priority'
    Harford     mba     harf-md          Maryland       2
    Harford     mba     harv-ny-city     new york       2
    Harford     mba     har-pa           Pa             2

I been playing with this for almost whole day now, with no results.

Logic-> This is an auto search functionality in my site. User can try to look for school names or cities of those schools. But user can also search both also. For example there is School iit in bombay, delhi, chennai. User can search like: "iit de" -> as soon as user type this it should auto complete and bring in iit Delhi at top search, then other iit locations. In total it should show 5 results.

NoviceMe
  • 3,126
  • 11
  • 57
  • 117
  • 3
    Danger! You are using user-provided $query, $city - you will get sql injection! What happens if $query is `'; drop table areas; --` ? – AMADANON Inc. Nov 13 '14 at 21:18
  • @AMADANONInc. $query is what user type in search bar. Is this wrong way to do it? – NoviceMe Nov 13 '14 at 21:21
  • 1
    That is exactly the problem. How would your query look if the user siad they wanted to search for `'; drop table areas; --`? Answer: `SELECT *, CASE WHEN text LIKE ''; drop table areas; --` - they've just dropped your table. Read up about sql injection attacks. There are ways around this, but the details vary from language to language. In short, you should never just put user-provided text into a query like this. – AMADANON Inc. Nov 13 '14 at 21:23
  • @AMADANONInc. - thanks i will read that. Any idea how i can query the other part? – NoviceMe Nov 13 '14 at 21:29
  • `What happens if $query is '; drop table areas; -- ` Exception throwed, Injections is not so simple today ) – vp_arth Nov 21 '14 at 14:03
  • Why should querying "harford ma" return the second and third lines in your expected result, if neither contains "harford" nor "ma"? – RealSkeptic Nov 21 '14 at 19:51
  • @RealSkeptic - That is what i am trying to achieve. It is auto complete search. I have to some how find out expected results if user searches above query. It is just an example to show what i am trying to achieve. 2 & 3 are dummy values. – NoviceMe Nov 21 '14 at 19:57
  • Well, perhaps this may help: http://stackoverflow.com/questions/5322917/how-to-compute-similarity-between-two-strings-in-mysql – Karolis Nov 21 '14 at 20:06
  • can you explain the exact logic you are wanting to use in your search – Marshall Tigerus Nov 21 '14 at 20:30
  • @MarshallTigerus - Explained logic in above in question. Please have a look. – NoviceMe Nov 21 '14 at 21:07
  • it sounds like the easiest way to do this is to union all two queries, one to get the top 1 match for your user's input, then a union all to match the top 4 of a query on some subset of the user's input (probably everything before the first space). – Marshall Tigerus Nov 21 '14 at 21:16
  • Look, you have not explained the logic. I see no logic in looking for "harford ma" and getting "Harvard new york". There is no connection between my query string and the result. So I ask again: what is the supposed connection between the query and the results? – RealSkeptic Nov 21 '14 at 23:08
  • @RealSkeptic - Search should show all school names with string harford, when we search "harford ma" but should show "Harford Maryland" first in the list. So basically we are trying to get top 5 results not just one exact match. – NoviceMe Nov 22 '14 at 07:55
  • @RealSkeptic -sorry my bad i updated the results it was typo on my part. – NoviceMe Nov 22 '14 at 09:06

1 Answers1

1

Generally, you'll have to work on this from whatever programming language you are using to call this query.

You'll need to split the query into separate words, and sanitize it (to protect against SQL injection, and to remove % characters that may influence your "like" queries. Removing punctuation if you don't have it in the actual table would do this nicely).

Then you'll have to build the query dynamically, and use each of the words as a query term in each of your fields, for example:

           AND (b.school_name LIKE '%$queryWords[0]%' 
                OR b.school_name LIKE '%$queryWords[1]%' 
                OR b.school_name LIKE '%$queryWords[2]%') 

...and so on.

It's important to note that you don't need the condition in the WHERE to be LIKE 'x%' OR LIKE '%x' OR like '%x%'. This is redundant and will unnecessarily slow the query, as all of them are included in LIKE '%x%'. The only place where it makes a difference if the match is exact or not is in the expression where you construct the priority to order by, so each of the where conditions should be as I pointed above - just LIKE '%$word%' for each of the words.

Or you may decide that for school names you only test $word[0], for school places you only check $word[1] or so on. It depends if you believe people will enter queries such as ma harford or only harford ma.

The biggest challenge here is to construct the priority. I suggest making the priority higher the higher the number, not the lower as you did, as this will allow you to sum priorities for given fields. So use ORDER BY priority DESC.

The priority expression itself is going to be rather complex:

CASE WHEN text = '$queryWords[0]' OR text = '$queryWords[1]' THEN 3 
     WHEN text LIKE '$queryWords[0]%' OR text LIKE '$queryWords[1]%' THEN 2 
     WHEN text LIKE '%$queryWords[0]%' OR text LIKE '%$queryWords[1]%' THEN 1
     ELSE 0 
END 
+
CASE WHEN params = '$queryWords[0]' OR params = '$queryWords[1]' THEN 3 
     WHEN params LIKE '$queryWords[0]%' OR params LIKE '$queryWords[1]%' THEN 2 
     WHEN params LIKE '%$queryWords[0]%' OR params LIKE '%$queryWords[1]%' THEN 1
     ELSE 0 
END 
AS priority 

(Of course if there are more words then there will be longer OR parts in each WHEN).

If you want to give more weight to the school name than to the school place, then you should change this to:

CASE WHEN text = '$queryWords[0]' OR text = '$queryWords[1]' THEN 12 
     WHEN text LIKE '$queryWords[0]%' OR text LIKE '$queryWords[1]%' THEN 8
     WHEN text LIKE '%$queryWords[0]%' OR text LIKE '%$queryWords[1]%' THEN 4
     ELSE 0 
END 
+
CASE WHEN params = '$queryWords[0]' OR params = '$queryWords[1]' THEN 3 
     WHEN params LIKE '$queryWords[0]%' OR params LIKE '$queryWords[1]%' THEN 2 
     WHEN params LIKE '%$queryWords[0]%' OR params LIKE '%$queryWords[1]%' THEN 1
     ELSE 0 
END 
AS priority 

This is essentially building the priority as a base-4 number, so that matches on text, even in the least exact option, are always higher than matches on params, even in the best match. If you add another match criterion, multiply each of the numbers by 4 and add it at the end with 3,2,1.

RealSkeptic
  • 33,993
  • 7
  • 53
  • 79
  • Thanks a lot great explanation. Only thing i did not understand is if i enter "Harvard ma" how will maryland parm show on top? As we are only checking school_name? AND (b.school_name LIKE '%$queryWords[0]%' OR b.school_name LIKE '%$queryWords[1]%' OR b.school_name LIKE '%$queryWords[2]%') – NoviceMe Nov 22 '14 at 11:09
  • I am using priority like u mentioned above: WHEN text = '$queryWords[0]' OR text = '$queryWords[1]' THEN 12 ... But it always select 12 when i search term "Harford Ma" which is only getting Harford and ignoring Ma city? – NoviceMe Nov 22 '14 at 11:21
  • It depends. If you want to prioritize the neigborhood only if the school actually matched one of the words, then the query is good as it is. Each matched school will come with whatever neigborhood, and your priority bit is going to put `ma` on top. If you want the neigborhoods that match to show up even if the school didn't match, you'll have to add them into that `OR` as well. – RealSkeptic Nov 22 '14 at 11:27
  • Sorry, I had a few missing `%` signs there. I edited and fixed. – RealSkeptic Nov 22 '14 at 11:29
  • Yes i did full plus search with second case. I will try putting "lower case" and see if that makes any difference. – NoviceMe Nov 22 '14 at 11:31
  • I tried that already by putting %. But same results it is not getting Maryland on top. It get all priority 12. It does not even go down. – NoviceMe Nov 22 '14 at 11:37
  • I figured it forgot to use Priority DESC. Thanks! – NoviceMe Nov 22 '14 at 12:05
  • You said above to remove -LIKE 'x%' OR LIKE '%x' OR like '%x%'. But if i do and search just "Harv" it does not yield any results? But it should? – NoviceMe Nov 22 '14 at 13:19
  • If you just search `LIKE '%harv%'` it will catch all: `harv`, `harvard`, `sharv` etc., so there is no point in adding the other conditions, they will just make the query slower. You may need to go lowercase. – RealSkeptic Nov 22 '14 at 13:40
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/65421/discussion-between-noviceme-and-realskeptic). – NoviceMe Nov 22 '14 at 13:55