6

We have been using MySQL Fulltext search for several years now, but the requirements have changed. We want to parse the AND/OR/NOT parameters to the form that MySQL does understand. I've written a unit test and it became clear that this is quite complicated.

I'm sure more people run into this problem, so I suppose there must be some kind of library that is able to do this for me. I've tried Google, but unfortunately, I was not able to find such a library. Does anybody know a good one?

The library should be able to handle quotes, parenthesis, AND/OR/NOT operators, and in our case it should default to AND instead of OR (if no operator is set). Here are some of my expected results:

  • 'ict' becomes '+ict'
  • 'ict it' becomes '+ict +it'
  • 'ict OR it' becomes 'ict it'
  • 'NOT ict' becomes '-ict'
  • 'it NOT ict' becomes '+it -ict'
  • 'web AND (ict OR it)' becomes '+web +(ict it)'
  • 'ict OR (it AND web)' becomes 'ict (+it +web)'
  • 'ict NOT (ict AND it AND web)' becomes '+ict -(+ict +it +web)'
  • 'php OR (NOT web NOT embedded ict OR it)' becomes 'php (-web -embedded ict it)'
  • '(web OR embedded) (ict OR it)' becomes '+(web embedded) +(ict it)'
  • develop AND (web OR (ict AND php))' becomes '+develop +(web (+ict +php))'
  • '"ict' becomes '+"ict"'
  • '"ict OR it"' stays '+"ict OR it"'

This is the function we used in the last years (which does not work properly):

/**
 * Parses search string.
 * @param string $s The unparsed search string.
 * @return string $s The parsed search string.
 */
public function parseSearchString( $s )
{
    // Place a space at the beginning.
    $s = ' ' . $s;

    // AND - Remove multiple spaces, AND, &.
    $s = preg_replace( '/\s\s+/', ' ', $s );
    $s = preg_replace( '/\sAND\s/i', ' ', $s );
    $s = preg_replace( '/\s&\s/', ' ', $s );

    // OR - Make replacements. Execute double, so we replace all occurences.
    $s = preg_replace( '/(\w+)\s(?:OR|\|)\s(\|?\w+)/i', '|\\1|\\2', $s );
    $s = preg_replace( '/(\w+)\s(?:OR|\|)\s(\|?\w+)/i', '|\\1|\\2', $s );
    $s = preg_replace( '/(\w+)\s*(?:\\\|\\/)\s*(\|?\w+)/i', '|\\1|\\2', $s );
    $s = preg_replace( '/(\w+)\s*(?:\\\|\\/)\s*(\|?\w+)/i', '|\\1|\\2', $s );

    // NOT
    $s = preg_replace( '/\bNOT\s(\w+)/i', '|-\\1', $s );

    // Quoted strings.
    $s = preg_replace( '/\s"/', ' +"', $s );

    // Place + in front of words.
    $s = preg_replace( '/\s(\w+)/', ' +\\1', $s );

    // Replace | to spaces.
    $s = preg_replace( '/\|/', ' ', $s );

    return trim( $s );
}
dbroeks
  • 180
  • 1
  • 9
  • Personally, I think this is a MARVELOUS question, and am curious to know if you ever found a solution – GDP Nov 21 '17 at 04:08
  • Back then, I wrote a custom keyword tokenizer and parser in php, which did all the magic. It worked quite well for us, but it wasn't that pretty. We never published the code. We've been using it for a few years, before switching to Elasticsearch. – dbroeks Nov 21 '17 at 08:55
  • 2
    Interestingly enough, someone else saw your question and wrote a class to answer it - and quite nicely I might add. They even used the same sample data. https://github.com/DuncanOgle/BooleanSearchParser – GDP Nov 22 '17 at 13:24

1 Answers1

0

This probably doesn't answer your question but if you need more advanced full-text search capabilities I'd recommend using something like sphinx or apache solr. https://stackoverflow.com/a/1297561/2170561

Community
  • 1
  • 1
null
  • 68
  • 1
  • 6
  • Thank you for your answer. We are aware of solutions like Sfinx, Xapian and Solr. But for now, we would like to stay with the full-text search. – dbroeks Apr 15 '13 at 14:27
  • 1
    Someone made a couple classes to parse strings for BOOLEAN mysql queries. https://github.com/DuncanOgle/BooleanSearchParser – abetwothree Sep 19 '19 at 18:56