1

I have two arrays which look like the following:

$arr1 = ("stringType1AndSomeRandomStuff",
         "stringType2AndSomeRandomStuff",
         "stringType3AndSomeRandomStuff",
         "stringType1AndSomeRandomStuff",
         "stringType2AndSomeRandomStuff",
         "i don't belong here at all!",
         "stringType4AndSomeRandomStuff");

In this first array ($arr1), most of the keys have some sort of common attribute. In the example text above, this would be stringTypeX. This 'common factor' is what I need to search by. Each string also has some sort of extra data exemplified by AndSomeRandomStuff.

The second array looks like this:

$arr2 = ("stringType1" => "category1",
         "stringType2" => "category2",
         "stringType3" => "category3",
         "stringType4" => "category4");

I need to go through each string in $arr1 and see if it closely matches any of the keys in $arr2. If it matches for one of the keys, I need the value of the key from $arr2.

How can I iterate through each of the strings in $arr1 and determine which (if any) of the keys in $arr2 apply? Basically, I need to go through every string in $arr1 and perform a partial match on all of the keys in $arr2, to find the closest match. The immediate solution that comes to mind is to use two loops (outer for everything in $arr1 and inner for each in $arr2), but is there a function in PHP that can take a string and see if it matches any string in an existing array? Does anybody know of a more performant way to do this?

Naftali
  • 144,921
  • 39
  • 244
  • 303
user1521764
  • 137
  • 3
  • 11

1 Answers1

3

Map $arr1 to a function that calculates the string-edit-distance to the keys in $arr2, and then returns the closest match. Take a look at this Levenshtein distance function. Or, you could simply do a startsWith comparison in your mapping function.

You'll likely have something that looks like this:

$stringEditDistanceThreshold = 5; // greater than this means rejected

// define the mapping function
function findClosestMatchingString($s) {
    $closestDistanceThusFar = $stringEditDistanceThreshold + 1;
    $closestMatchValue      = null;

    foreach ($arr2 as $key => $value) {
        $editDistance = levenshtein($key, $s);

        // exact match
        if ($editDistance == 0) {
            return $value;

        // best match thus far, update values to compare against/return
        } elseif ($editDistance < $closestDistanceThusFar) {
            $closestDistanceThusFar = $editDistance;
            $closestMatchValue      = $value;
        }
    }

    return $closestMatch; // possible to return null if threshold hasn't been met
}

// do the mapping
$matchingValues = array_map('findClosestMatchingString', $arr1);

You'll likely have to tune the $stringEditDistanceThreshold until you get values that you're happy with. Or you could use a startsWith function, which would greatly simplify what findClosestMatchingString has to do.

Finally, this isn't very efficient. It's effectively an ugly nested loop. You may be able to do some pruning or something else clever, but I suspect if the arrays are fairly small, you may not care.

EDIT: As stated by @Ohgodwhy in the comment below, preg_grep will likely work even better for you. In that case, your map function will look something like the following:

function findFirstMatchingString($s) {
    $matchingKeys = preg_grep($s, array_keys($arr2));

    if (!empty($matchingKeys) {
        // return the value of the first match
        return $arr2[$matches[0]];
    }

    return null;
}
Community
  • 1
  • 1
Mike Cialowicz
  • 9,892
  • 9
  • 47
  • 76
  • 2
    Would it not be easier to `preg_grep`, then on the returned flags simply `preg_split()` and compare that to the key, and get the value? – Ohgodwhy Jul 18 '13 at 01:04
  • 1
    @Ohgodwhy: yep, preg_grep is likely way better. I didn't think of that. I suppose I had more of a 'fuzzy' match example in my mind. – Mike Cialowicz Jul 18 '13 at 01:17