1

I'm building a filter system with PHP. I need to give a priority on a text based on keywords matches. The filter has to recognize different types of keywords.

One of the types is normal words, keyword1 keyword2. This would filter on texts with both 'keyword1' and 'keyword2', no matter what order of whether they occur consecutively in the text or not.

An other type is a precise combination of words, "keyword1 keyword2". This would give priority to articles with the exact combination "keyword1 keyword2".

There are other types but they aren't relevant here.

Keyword types may be combined, so keyword1 "keyword2 keyword3" is valid and would search for articles with both "keyword1" and the exact combination "keyword2 keyword3".

For the first type, I can use an explode(' ', $keywords) to get the keywords in an array. However, this would mess up with keyword1 "keyword2 keyword3", because the text in the quotation marks would get separated as well.

So I need a function that separates the keywords, but doesn't separate the text in quotation marks. Is there a function that can do that? If not, is a regex the way to go?

  • Yes, regex could do what you want. What have you tried so far? – MCL May 31 '13 at 07:29
  • @MCL I can write a regex, but I would like to know if there's a built-in function to do this. I don't ask you to write the regex, no `send-me-teh-codez` ;) –  May 31 '13 at 07:30
  • You can write your own parser, or make a match with regex. (I wouldn't recommend explode) – Javier Diaz May 31 '13 at 07:31
  • Have a look at [this answer](http://stackoverflow.com/a/2202489/1282023). – MCL May 31 '13 at 07:33
  • Yes, duplicate of [PHP explode the string, but treat words in quotes as a single word](http://stackoverflow.com/questions/2202435/php-explode-the-string-but-treat-words-in-quotes-as-a-single-word) - sorry! –  May 31 '13 at 07:35

1 Answers1

5

You could use regex:

$string = 'test1 test2 "test3 test4"';
preg_match_all('/\"[\s\S]+\")|([\S]+)/ism', $string, $matches);

print_r($matches);

Alternatively, you could try using str_getcsv()

Phil Cross
  • 9,017
  • 12
  • 50
  • 84
  • Thanks! Is there a reason (performance?) I would use the regex instead of str_getcsv? –  May 31 '13 at 07:38
  • Not really sure, I haven't tested the performance unfortunately, however you might get better long-term performance using `str_getcsv()` , however I think its mainly just preference – Phil Cross May 31 '13 at 07:41
  • 1
    There's a missing `(`. Also, had to modify this to work with multiple strings in quotes like this: `preg_match_all('/(\"[\s\S]+?\")|(\'[\s\S]+?\')|([\S]+)/', $string, $matches);` – Alexander Oct 12 '20 at 09:39