0

If one searches for vienna university, my script looks for the presence of vienna and university in some MySQL columns.

Now, I want to modify it in the following manner: If the query contains multiple words starting and ending with quotation marks (e.g. "vienna university"), the script should not separate the words, but rather search for the exact match of vienna university in the database.

And if the query is Meyer "vienna university", the script should search for the presence of Meyer and vienna university in the MySQL table.

Would you know how to make this happen?

My code so far:

<?php
$search_keyword = $_GET['query'];
//multiple keywords
$search_keyword = preg_split('/[\s]+/', $search_keyword);
$totalKeywords = count($search_keyword);
            
$sql = "SELECT * FROM `editors`
        WHERE CONCAT_WS(`journal`, `editor`, `affiliation`)
        LIKE ?";
        
//...for multiple keywords
for($i=1 ; $i < $totalKeywords; $i++){
    $sql .= " AND CONCAT_WS(`journal`, `editor`, `affiliation`)
              LIKE ?";
}
?>
Dharman
  • 30,962
  • 25
  • 85
  • 135
anpami
  • 760
  • 5
  • 17
  • Check if https://stackoverflow.com/questions/32034567/split-string-on-spaces-except-words-in-quotes answers the problem. (or https://stackoverflow.com/questions/2202435/php-explode-the-string-but-treat-words-in-quotes-as-a-single-word ) – Nigel Ren Dec 20 '20 at 21:20

2 Answers2

3

This is easily done with PHP preg_* functions; if we do it in steps:

  1. Match and remove the quoted strings from the search query
  2. Split the remaining words on spaces
  3. Merge the two results together

Code

I've wrapped the code in a function for convenience...

function splitWords($query){

    // Matches a single or double quote followed by any character  or more
    // times followed by the same quote as in group 1. i.e. if the first capture
    // a single quote then \1 == '
    $regex = '/(\'|")(.*?)\1/';

    // Find all matches
    // Matches for the second capture group will be returned in index [2]
    preg_match_all($regex, $query, $matches);

    // Remove quoted strings from the query
    // Split on spaces IF there's still words
    if( $query = trim(preg_replace($regex, '', $query)) )
        $arr = preg_split( "/\s+/", $query );

    // Return array of split words followed by quoted strings
    // $arr ?? [] ==> If $arr doesn't exist then $arr = []
    return array_merge($arr ?? [], $matches[2]);
}


print_r( splitWords("words 'in quotes' more letters 'and quotes' and \"other quotes\" ") );

print_r( splitWords('"Anna Huber"') );

print_r( splitWords('Anna') );

Output

Array
(
    [0] => words
    [1] => more
    [2] => letters
    [3] => and
    [4] => in quotes
    [5] => and quotes
    [6] => other quotes
)

Array
(
    [0] => Anna Huber
)

Array
(
    [0] => Anna
)
Steven
  • 6,053
  • 2
  • 16
  • 28
  • Thank you, almost there! However, it seems to return an empty string as the first array-element when there are quotation marks. If the query is `"Anna Huber"`, there seems to be an empty string first, and then (as the 2nd element) `Anna Huber`. At least if I do `$search_keyword = splitWords($search_keyword)` and then `foreach((array) $search_keyword as $keys => $search_keyword) [...]`, it loops through *two* elements, for some reason, even though there should only be one (i.e. `Anna Huber`). Strangely, `print_r($search_keyword)` only shows `Anna Huber` as the sole element, though. – anpami Dec 21 '20 at 09:33
  • @anpami see updated code for a working fix! – Steven Dec 21 '20 at 13:12
1

Use a FULLTEXT index:

MATCH(col) AGAINST("vienna university") -- either word
MATCH(col) AGAINST("+vienna +university" IN BOOLEAN MODE) -- both words
MATCH(col) AGAINST('+"vienna university"') -- both words together

Yes, you would need to parse your request to rebuild it this way. Yes, there are potential issues with punctuation, stopwords, minimum word length, etc.

It is very fast.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • After first having used Steven's response, I moved to your solution, Rick, because it is indeed much faster & much more efficient in terms of boolean search terms. I used it in conjunction with the [PHP/MySQL Boolean Search Parser](https://github.com/DuncanOgle/BooleanSearchParser) and the [relevant PDO codes](https://stackoverflow.com/questions/13682355/pdo-and-mysql-fulltext-searches). Thank you. However, I still think it is fair to leave the accpetance mark with Steven's response because of his effort to address the narrow scope of my question. Nevertheless, thank you so much, Rick! – anpami Jan 14 '21 at 13:00