1

Due to a weird set of circumstances, I need to determine if a value exists in a known set, then take an action. Consider:

An included file will look like this:

// Start generated code
$set = array();
$set[] = 'foo';
$set[] = 'bar';
// End generated code

Then another file will look like this:

require('that_last_file.php');

if(in_array($value, $set)) {
  // Do thing
}

As noted, the creation of the array will be from generated code -- a process will create a PHP file which will be included above the if statement with require.

How concerned should I be about the size of this mess -- both in bytes, and array values? It could easily get to 5,000 values. How concerned should I be with the overhead of a 5,000-value array? Is there a more efficient way to search for the value, other than using in_array on an array? How painful is including a 5,000-line file via require?

I know there are ultimately better ways of doing this, but my limitations are that the set creation and logic has to be in an included PHP file. There are odd technical restrictions that prevent other options (i.e. -- a database lookup).

Krupal Panchal
  • 1,553
  • 2
  • 13
  • 26
Deane
  • 8,269
  • 12
  • 58
  • 108
  • 4
    _"How concerned should I be about the size of this mess -- both in bytes, and array values?"_ - negligibly concerned – Sean Bright Nov 19 '19 at 14:33
  • 1
    As @SeanBright said, you shouldn't concern yourself, based on the size of the array. The only problem could come from the size of the individual values, if they are really huge, but in that case, 100 elements or 10000 elements, it won't matter. – Martin Dimitrov Nov 19 '19 at 14:36
  • 500,000 takes about .01 seconds I guess depending on hardware/resources etc. https://3v4l.org/nrKQL – AbraCadaver Nov 19 '19 at 14:38
  • 2
    If you're really concerned I'd say test it. You'll likely find that there is nothing to be concerned about. Your only "real" limitation is memory available. – Dave Nov 19 '19 at 14:38
  • @MartinDimitrov The array values will be integers or strings (I can do either). No more than five-digits. – Deane Nov 19 '19 at 14:40
  • @Deane Then it will not matter. You can make it a million values and it will not be felt. – Martin Dimitrov Nov 19 '19 at 14:41
  • A SQLite database works wherever PHP is running and is the better solution ! – jspit Nov 19 '19 at 14:51
  • 1
    Maybe have a look at this as well: [what is faster: in\_array or isset?](https://stackoverflow.com/questions/13483219/what-is-faster-in-array-or-isset) (isset needs the values to be used as array _keys_, of course, but depending on the specific situation, creating one from the original data array and then working with that might be an option.) – 04FS Nov 19 '19 at 15:01
  • If the values were sorted, you could do a binary search potentially, but it's probably not worth it. – Sean Bright Nov 19 '19 at 15:01
  • 1
    @04FS I love those stats, thank you. But I'm actually not setting keys, just values. Does it matter? If I don't specify the key, just the value, what does the key get set as? If checking for keys is faster, should I do something like `$set['foo'] = true;` ? – Deane Nov 19 '19 at 15:11
  • Yes, I'd do `$set['foo'] = true;` if possible. – Adder Nov 19 '19 at 15:28

1 Answers1

0

A faster way would be:

if (array_flip($set)[$value] !== null) {
    // Do thing
}

A 5000 value array really isn't that bad though if it's just strings

Shardj
  • 1,800
  • 2
  • 17
  • 43
  • Can you explain that further? What **exactly** do you mean by faster? – Nico Haase Nov 19 '19 at 15:37
  • I mean if you benchmark this, for numeric arrays this is typically the fastest way to check if a value exists in an array – Shardj Nov 19 '19 at 16:08
  • Please back that with real numbers. – Nico Haase Nov 19 '19 at 16:17
  • 1
    Really? It isn't difficult to go benchmark this for yourself if you're that interested. This is a well known way to optimise array searching through numeric arrays, key lookups are far faster than in_array. – Shardj Nov 19 '19 at 16:34
  • 1
    If it was that easy to provide a proper benchmark, you could have done that to enhance your answer. An example benchmark at https://gist.github.com/ksimka/21a6ff74b41451c430e8 shows that `array_flip` in combination with `isset` is the slowest option for an array of up to 10k elements. I've modified it at https://3v4l.org/IhZ8F and `in_array` is still way faster. So, just for completeness: what have I done wrong in that simple benchmark? – Nico Haase Nov 20 '19 at 07:15