Is there an efficient way to combine the return arrays of array_count_values($arr1)
and array_count_values($arr2)
if $arr1
and $arr2
have elements with the same value?
I'm trying to work through the classical "generate the top 100 search requests from a document that contains 1 billion lines of search requests."
My approach is to use unix split
to chop up the document into smaller files, count the number of occurrences of each search term in each file with array_count_values
, then reduce all those files into a single file that has a list sorted in descending popularity of each search query.
EDIT For example
$arr1 = array('kurt', 'curt', 'kurt', 'dave', 'krist');
$arr2 = array('dave' 'dave', 'krist', 'krist');
array_count_values($arr1) // ('kurt' => 2, 'curt'=>1, 'dave'=>1, 'krist'=>1)
array_count_values($arr2) // ('dave' => 2, 'krist'=>2)
How can I combine the two to form the following array
('kurt' => 2, 'dave'=>3, 'krist'=>3, 'curt'=>1)