1

I have 2 arrays, Array A and B respectively . Array A contains ~300,000 string records, e.g.

[0] => 'apple',
[1] => 'pineapple',
[2] => 'orange',
...
[299,999] => 'banana'

while Array B contains 100,000 string values, e.g.

[0] => 'bamboo',
[1] => 'banana',
[2] => 'boy',
[3] => 'ball',
[4] => 'balloon',
[5] => 'bazooka',

The question is, how to find out the common values between 2 arrays ?

array_intersect() seems a promising function, but I worry about the performance. Is it better to convert the 2 arrays into text file, and do file-based compare? or am I worrying too much?

Codes to use array_intersect():

$result_array = array_intersect($arrayA, $arrayB);
Raptor
  • 53,206
  • 45
  • 230
  • 366
  • 5
    Have you tested to see what performance might be? – Jared Farrish Mar 12 '13 at 02:58
  • 1
    If you have two arrays, `array_intersect` is probably the most efficient way to find the intersection. Writing them to files doesn’t really make much sense. – Ry- Mar 12 '13 at 03:01
  • Unless you're planning to rely on the OS's diff command (assuming it's available) array_intersect should be your best bet – rantsh Mar 12 '13 at 03:03
  • https://ignite.io/code/513e9afcec221ebe52000000 Seems quick enough? – Jared Farrish Mar 12 '13 at 03:03
  • yes, `array_intersect()` is the choice! completed the operation in less than 1 second. – Raptor Mar 12 '13 at 03:10
  • 1
    indeed it is the best solution. writting you data in file will take more time than processing data in memory :) – MatRt Mar 12 '13 at 03:13

2 Answers2

1

Result based on my own test, array_intersect() is the choice. It can produce the result in less than 1 second, as its efficiency is O(n·log n).

Reference: https://stackoverflow.com/a/6329494/188331

Community
  • 1
  • 1
Raptor
  • 53,206
  • 45
  • 230
  • 366
-1

array_intersect function will be used for retrieving common values across arrays

But as array size is huge you need to specify configuration in script for execution with concern to performance

    set_time_limit(0);
    ini_set('memory_limit','128M');

The above code snippet will respectively set the execution time limit to infinity and increasing memory limit will allocate more memory required to hold large sized array

Rubin Porwal
  • 3,736
  • 1
  • 23
  • 26
  • 1
    If it ain’t broke, don’t fix it. A total of ~400,000 strings of about that length is only going to be about 28MB. – Ry- Mar 12 '13 at 03:03
  • @minitech - If it ain't broke, try try again. Then walk away slowly. No one will know. – Jared Farrish Mar 12 '13 at 03:10
  • 1
    @minitech: 38.5Mb for me here, and doesn't change until the string became *really long* (`$o = memory_get_usage(true); $a = array_fill(0, 300000, 'apple'); var_dump((memory_get_usage(true) - $o) / 1024 / 1024);`) – zerkms Mar 12 '13 at 04:07