3

I have a list of say 40 alphabetically sorted terms I would like to split in groups of similar size, while keeping a grouping by starting letter.

The goal is to create an alphabetical list in multiple chunks with headers indicating the starting letters of each chunk, like A-D, E-H etc.

I thought about proceeding like that. Starting with the list:

$terms = array('Archers','Arrows','Bees' [etc...]);

Then group them by starting letter in a multidimensional array:

$terms = array(
  'a' => array('Archers','Arrows'),
  'b' => array('Bees'),
   // [..etc..]
  'z' => array('Zebras','Zebus')
);

Then re-group this multidimensional array into four groups that are about the same size. Something like that:

$termgroups = array(
  // first group with e.g. 12 items
  'A-C' => array(
        'a' => array('Archers','Arrows'),
        'b' => array('Bees')
  ),
  // second group with e.g. 9 items
  // ...etc...
);

But that would mean a lot of counting, iterating, maybe making a first try, then going over the whole stuff again.

I'm not sure how to approach this task and I have the feeling it's been done many times before – but I'm not sure how to call it.

How would you approach that?

Urs
  • 4,984
  • 7
  • 54
  • 116

2 Answers2

2

This is not a trivial task. Here is another question about linear partitioning. Luckily, there you can find a PHP implementation of the algorithm there. So having that, your problem is reduced to finding the way to use existing solution:

sort($terms);

$mappedToFirstLetter = array_reduce(
    $terms,
    function ($mappedToFirstLetter, $term) {
        $letter = strtolower(substr($term, 0, 1));

        if (!isset($mappedToFirstLetter[$letter])) {
            $mappedToFirstLetter[$letter] = [];
        }

        $mappedToFirstLetter[$letter][] = $term;

        return $mappedToFirstLetter;
    },
    []
);

// Count words for each letter in order to use
// linear partition algorithm.
$countByLetters = array_values(array_map('count', $mappedToFirstLetter));

$numberOfGroups = 4;

$groups = linear_partition($countByLetters, $numberOfGroups);

// Group words using linear partition algorithm results.
$chunked = array_reduce(
    $groups,
    function ($chunked, $group) use (&$mappedToFirstLetter) {
        // Get portion of words.
        $chunk = array_reduce(range(1, count($group)), function ($chunk) use (&$mappedToFirstLetter) {
            $chunk[key($mappedToFirstLetter)] = array_shift($mappedToFirstLetter);
            return $chunk;
        }, []);

        // Generate group name using chunk keys.
        $key = preg_replace_callback(
            '/^([a-z])(?:([a-z]*)([a-z]))?$/',
            function ($matches) {
                $matches = array_pad($matches, 4, '');
                return $matches[1] . ($matches[3] ? '-' : '') . $matches[3];
            },
            implode('', array_keys($chunk))
        );
        $chunked[$key] = $chunk;

        return $chunked;
    },
    []
);

You can find linear_partition function among the answers of mentioned questions.

Here is working demo.

By the way, such questions usually have a bounty, because, as I wrote, this is not a trivial task. And even further, this is not really a question, but a problem. And this is not an answer, but problem solution. But, as there are not many interesting questions out there it would be a shame not to answer this one.

Community
  • 1
  • 1
sevavietl
  • 3,762
  • 1
  • 14
  • 21
  • You're right – the question is really a bit "please code it for me"-ish :-) - I wasn't really sure if I should ask it that way. The bounty can't be set immediately after posting though. I think SO should be monetizable btw., but that will probably never happen. – Urs Nov 23 '16 at 13:50
  • I'm looking forward to implement that, thanks a lot!! :-))) – Urs Nov 23 '16 at 13:52
0

To do that I've created two functions a2z($terms) and chankItems($terms,$chunkList). If you call chunkItem function then it will call a2z function and fixed original array as an array with a to z keys.

For testing purpose I have generated a-z word list.

// use this section for generate sample data set
// only testing purpose
$terms = array();
foreach(range('A', 'Z') as $key){
    foreach (range(1, 3) as $value) {
        $terms[] = $key."word".$value;
    }
}

// get output with a to z keys
$termsAtoZ = a2z($terms);
// print a to z output
echo "<pre>";
print_r($termsAtoZ);
echo "</pre>";

// chunk array(expect to chunk)
$chunkList = array('A-D', 'E-I', 'J-Z');

// Get output as chunk
// $terms - orginal array
$termsChunk = chunkItems($terms,$chunkList);
// print chunked output
echo "<pre>";
print_r($termsChunk);
echo "</pre>";

// use function generate output with a to z keys
function a2z($terms){
    // sort terms array a to z
    sort($terms);
    // a - z keys array
    $a2z = array();
    foreach ($terms as $word) {
        $key = str_split($word)[0];
        $a2z[strtolower($key)][] = $word;
    }
    return $a2z;
}

// use this function for generate chunks array
// example A-C, D - G
function chunkItems($terms,$chunkList){
    // get a-z format array output
    $a2zFormatList = a2z($terms);
    $chunkArray = array();
    // loop chunk list
    foreach($chunkList as $chunk){
        // loop chunk first letter to end letter
        foreach(range(strtolower(str_split($chunk)[0]), strtolower(str_split($chunk)[2])) as $letter){
            // if letter exist in a - z array, then copy that key's array to output  
            if (array_key_exists($letter, $a2zFormatList)) {
                $chunkArray[$chunk][$letter] = $a2zFormatList[$letter];
            }
        }
    }
    return $chunkArray;
}
Gayan
  • 2,845
  • 7
  • 33
  • 60