-1

I have a big array that contains string values as the data. I want to optimize this array, so I can perform queries that check if a certain string exists in the array as quickly as possible. So let's say I create the array with $arr= []; and then add values like this:

foreach($names as $name)
    $arr[]= $name;

And now I want to perform a lot of queries like if(in_array($random_string, $arr)), but it's pretty slow. I'd like to add some indexing for the array to optimize the performance. Should I simply use the sort() function for the array?

How to optimize an array with string data for checking if a string exists queries?

EDIT: No, obviously this is not a duplicate of "what is faster: in_array or isset? [closed]", and you can see that already by the answer by vivek_23.

Jerry
  • 3
  • 4
  • Are values unique? Do you have to find all matches or just see if exists? If you could set index and use `isset` that would be a bit more performant. [ref](https://stackoverflow.com/questions/13483219/what-is-faster-in-array-or-isset) – ficuscr May 24 '18 at 21:52
  • Yes, values are unique and I can use them as array keys instead of array values if that's the most efficient solution. Anyway, should I sort the array? Does it help with indexing? – Jerry May 24 '18 at 22:11
  • 2
    My understanding is `isset` will be O(1) hash search where `in_array` will check every value until it finds a match. isset also has less call overhead. You would not need to sort if setting array index to the name values. See hashtable. – ficuscr May 24 '18 at 22:33
  • 1
    Possible duplicate of [what is faster: in\_array or isset?](https://stackoverflow.com/questions/13483219/what-is-faster-in-array-or-isset) – mickmackusa May 24 '18 at 23:22

1 Answers1

1
  • I would suggest you to do sorting with binary search to know if a value exists. Time Complexity will be O(N log N) for sorting and O(log N) to search each individual element, where N is the number of elements in the array.

  • You could also create an associate array and check with the help of isset() to see if the value exists. But, hashing keys would make PHP to internally manage the hash structure consuming a bit of memory since you have big string arrays. Also, using isset($arr['some_key']) may not be necessarily an O(1) operation because of collisions.

Below is my code which uses binary search approach-

<?php

function checkIfValueExists($arr,$search_value){

    $low  = 0;
    $high = count($arr) - 1;

    while($low <= $high){
        $mid = $low + intval(($high - $low) / 2);
        $compare_result = strcmp($arr[$mid],$search_value);
        if($compare_result === 0) return true;
        else if($compare_result < 0) $low = $mid + 1;
        else $high = $mid - 1;
    }

    return false;
}

The driver code to test above function-

<?php 

$arr = array();

$str = "abcdefghijklmnopqrstuvwxyz";

$values_to_check = array();

for($i=1;$i<=50000;++$i){
    $str_length = rand(1,50);
    $new_str = "";
    while($str_length-- > 0){
        $new_str .= $str[rand(0,25)];
    }

    $arr[]  = $new_str;
    if(rand(0,1) === 1){
        $values_to_check[] = rand(0,1) === 1 ? $new_str . $str[rand(0,25)] : $new_str;
    }
}

// sort the array of strings.

sort($arr);

// test the functionality

foreach($values_to_check as $each_value){
    var_dump(checkIfValueExists($arr,$each_value));
    echo "<br/>";
}
nice_dev
  • 17,053
  • 2
  • 21
  • 35