About your request:
User can put ingredients in any order and he can delimited by any char or string (space, comma) or delimiter is not necessary.
The order of ingredients isn't a problem, we will see that later. But to do without delimiters is a very bad idea ! Consider the following example (a fruit salad):
$ingredients = ['melon', 'orange', 'grape', 'apple'];
$userAnswer = 'watermelonorangegrapeapple';
The problem is obvious, there is no way to differentiate "melon" from "watermelon" with this type of constraint that will cause false positives.
Don't forget that a user is responsible of what he writes and will learn from his own errors when he doesn't obtain the desired result. An other way consists to force the user to enter ingredients one by one using input fields.
User's answer must include all 4 ingredients in any order and he cannot have typos in the name of ingredients.
Why not, but this time you are too much constrictive in my opinion: What if the user write "strawberries" and not "strawberry" ? It isn't really a typo, I think it's acceptable.
Possibilities:
Lets assume that everything is for the best in the best of all possible worlds: words are delimited and there's no typo.
As suggested in the previously linked question, you can do:
if ( preg_match('~(?=.*\bword1\b)(?=.*\bword2\b)(?=.*\bword3\b)(?=.*\bword4\b)~Ai', $userAnswer) ) {
//...
}
But it isn't the compact, right to the point way of your dreams:
- It doesn't take in account delimiters.
- You have to build dynamically the pattern for each ingredients list. (However it isn't difficult)
- Each lookahead has to go through the whole string.
- It isn't flexible nor scalable at all.
- If you have doubts about points 2 to 5, see the point 1.
Other approach: you can split the user string with the delimiter and use array_diff
to see if each ingredient is present.
Basic:
$delimiter = '~ \b \s* (?: , \s* | \s and \s+ ) ~uxi';
$parts = preg_split($delimiter, $userAnswer, -1, PREG_SPLIT_NO_EMPTY);
if ( empty(array_diff($ingredients, $parts)) ) {
// all ingredients are here
}
With a sanitization:
$delimiter = '~ \b (?: [ ]? , [ ]? | [ ] and [ ] ) ~ux';
$userAnswer = trim(preg_replace('~[\s\pP]+~u', ' ', mb_strtolower($userAnswer)));
$parts = preg_split($delimiter, $userAnswer);
if ( empty(array_diff($ingredients, $parts)) ) {
// all ingredients are here
}
With a lenient comparison between strings:
$delimiter = '~ \b (?: [ ]? , [ ]? | [ ] and [ ] ) ~ux';
$userAnswer = trim(preg_replace('~[\s\pP]+~', ' ', mb_strtolower($userAnswer)));
$parts = preg_split($delimiter, $userAnswer);
if ( empty(array_udiff($ingredients, $parts, $callback)) ) {
// all ingredients are here
}
Callback function example:
Callback functions for array_udiff
are nothing more than comparison functions to sort an array, in other words, sorting is a necessary step under the hood to compare two arrays. That's why a comparison between two items should result in a positive, negative integer or 0 to determine the order.
PHP has two functions to perform a fuzzy comparison between strings: similar_text()
and levenshtein()
.
An example using the levenshtein distance. Less than 2 means that only one character can be replaced, inserted or deleted to make the two strings equal (see the PHP manual for more control).
$callback = function ($a, $b) {
return levenshtein($a, $b) < 2 ? 0
: ( $a < $b ? -1 : 1 );
}
Note that these two functions may have a non negligible cost for long strings since similar_text()
is O(max(m,n)^3) and levenshtein()
is O(m*n) (m and n are the lengths of the strings). If it becomes problematic, you can also use functions like metaphone()
or soundex()
to transform the string before comparing them or write a transformation of your own. This involves having to modify the data structure containing the ingredients in advance in order to make the comparison easier.