2

I want to sort an array of words alphabetically. Unfortunately, in my language (Croatian), there are double-character letters (e.g. lj, nj, dž), and letters that are not properly sorted with php sort function (e.g. č, ć, ž, š, đ).

Here is the Croatian alphabet properly ordered (with some English letters aswell):

$alphabet = array(
            'a', 'b', 'c',
            'č', 'ć', 'd',
            'dž', 'đ', 'e',
            'f', 'g', 'h',
            'i', 'j', 'k',
            'l', 'lj', 'm',
            'n', 'nj', 'o',
            'p', 'q', 'r',
            's', 'š', 't',
            'u', 'v', 'w',
            'x', 'y', 'z', 'ž'
          );

And here is a list of words, also properly ordered:

$words = array(
            'alfa', 'beta', 'car', 'čvarci', 'ćup', 'drvo', 'džem', 'đak', 'endem', 'fićo', 'grah', 'hrana', 'idealan', 'jabuka', 'koza', 'lijep', 'ljestve', 'mango',
            'nebo', 'njezin', 'obrva', 'pivnica', 'qwerty', 'riba', 'sir', 'šaran', 'tikva', 'umanjenica', 'večera', 'wind', 'x-ray', 'yellow', 'zakaj', 'žena'
          );

I was thinking of ways to sort it. One way was to split each word into letters. Since I didn't know how to do that because of multicharacter letters, I asked a question and got a good answer which solved that problem (see here). So I looped through the array and split each word into letters using the code provided by best answerer. When the array was looped I had a new array (let's name it $words_splitted). Elements of that array were arrays aswell, each representing a word.

Array
(
    [0] => Array
        (
            [0] => a
            [1] => l
            [2] => f
            [3] => a
        )

    [1] => Array
        (
            [0] => b
            [1] => e
            [2] => t
            [3] => a
        )

    [2] => Array
        (
            [0] => c
            [1] => a
            [2] => r
        )...
 ...[16] => Array
        (
            [0] => lj
            [1] => e
            [2] => s
            [3] => t
            [4] => v
            [5] => e
        )

The idea was to compare each letter of each array by the index value of $alphabet variable. For example, $words_splitted[0][0] would be compared with $words_splitted[1][0], and then with $words_splitted[2][0], etc. If we compare letters 'a' and 'b', letter 'a' has smaller index number in $alphabet variable, so it comes before 'b'.

Unfortunately, I got stuck...and I'm not sure how to do this. Any ideas?

NOTE: PHP extensions shouldn't be used.

Community
  • 1
  • 1
dodo254
  • 519
  • 2
  • 7
  • 16

2 Answers2

0

Here is a class that can help you sort array of strings based on a specific alphabet characters table:

<?php

/**
 * This class can be used to compare unicode strings.
 * It can be used for easy array sorting.
 * 
 * You can set your own alphabet characters table to be used.
 */
class UnicodeStringComperator {
    private $alphabet = [];

    public function __construct() {
        // We set the default alphabet characters table to a-z.
        $this->alphabet = range('a', 'z');
    }

    /**
     * Set the characters table to use for sorting
     * 
     * @param array $alphabet The characters table for the sorting
     */
    public function setAlphabet($alphabet) {
        $this->alphabet = $alphabet;
    }

    /**
     * Split the string into an array of the characters
     * 
     * @param string $str The string to split
     * @return array The array of the characters characters in the string
     */
    public function splitter($str){
        return preg_split('//u', $str, -1, PREG_SPLIT_NO_EMPTY);
    }

    /**
     * Find the place of the char in the alphabet table
     * 
     * @param string $chr The character to find
     * @return mixed the place of the char in the table or NULL if not found
     */
    public function place($chr) {
        return array_search($chr, $this->alphabet);
    }

    /**
     * Do the comparison between the 2 strings
     * 
     * @param string $str1 The first
     * @param string $str2 The first
     * @return int The values -1, 0, 1 if $str1 < $str2, $str1 == $str2 or $str1 > $str2 accordingly
     */
    public function compare($str1, $str2) {
        $chars1 = $this->splitter($str1);
        $chars2 = $this->splitter($str2);
        for ($i = 0; $i < count($chars1) && $i < count($chars2); $i++) {
            $p1 = $this->place($chars1[$i]);
            $p2 = $this->place($chars2[$i]);
            if ($p1 < $p2) {
                return -1;
            } elseif ($p1 > $p2) {
                return 1;
            }
        }
        if (count($chars1) <= count($chars2)) {
            return -1;
        }
        return 0;
    }

    /**
     * Sort an array of strings based on the alphabet table
     * 
     * @param Array $ar The array of strings to sort
     * @return Array The sorted array.
     */
    public function sort_array($ar) {
        usort($ar, array('self', 'compare'));
        return $ar;
    }
}

To use with your specific alphabet you can use the setAlphabet function to configure your own characters-sort-table:

<?php
$alphabet = array(
            'a', 'b', 'c',
            'č', 'ć', 'd',
            'dž', 'đ', 'e',
            'f', 'g', 'h',
            'i', 'j', 'k',
            'l', 'lj', 'm',
            'n', 'nj', 'o',
            'p', 'q', 'r',
            's', 'š', 't',
            'u', 'v', 'w',
            'x', 'y', 'z', 'ž'
    );
$comperator = new UnicodeStringComperator();
$comperator->setAlphabet($alphabet);
$sorted_words = $comperator->sort_array($words);
var_dump($sorted_words);

The output is your original array:

array(34) {
  [0] =>
  string(4) "alfa"
  [1] =>
  string(4) "beta"
  [2] =>
  string(3) "car"
  [3] =>
  string(7) "čvarci"
  [4] =>
  string(4) "ćup"
  [5] =>
  string(4) "drvo"
  [6] =>
  string(5) "džem"
  [7] =>
  string(4) "đak"
  [8] =>
  string(5) "endem"
  [9] =>
  string(5) "fićo"
  [10] =>
  string(4) "grah"
  [11] =>
  string(5) "hrana"
  [12] =>
  string(7) "idealan"
  [13] =>
  string(6) "jabuka"
  [14] =>
  string(4) "koza"
  [15] =>
  string(5) "lijep"
  [16] =>
  string(7) "ljestve"
  [17] =>
  string(5) "mango"
  [18] =>
  string(4) "nebo"
  [19] =>
  string(6) "njezin"
  [20] =>
  string(5) "obrva"
  [21] =>
  string(7) "pivnica"
  [22] =>
  string(6) "qwerty"
  [23] =>
  string(4) "riba"
  [24] =>
  string(3) "sir"
  [25] =>
  string(6) "šaran"
  [26] =>
  string(5) "tikva"
  [27] =>
  string(10) "umanjenica"
  [28] =>
  string(7) "večera"
  [29] =>
  string(4) "wind"
  [30] =>
  string(5) "x-ray"
  [31] =>
  string(6) "yellow"
  [32] =>
  string(5) "zakaj"
  [33] =>
  string(5) "žena"
}
Dekel
  • 60,707
  • 10
  • 101
  • 129
0

You can try Collator.

$words = array( 'alfa', 'beta', 'car', 'čvarci', 'ćup', 'drvo', 'džem', 'đak', 'endem', 'fićo', 'grah', 'hrana', 'idealan', 'jabuka', 'koza', 'lijep', 'ljestve', 'mango', 'nebo', 'njezin', 'obrva', 'pivnica', 'qwerty', 'riba', 'sir', 'šaran', 'tikva', 'umanjenica', 'večera', 'wind', 'x-ray', 'yellow', 'zakaj', 'žena' );
$collator = new Collator('hr_HR');
// or $collator = new Collator( 'hr' );
$collator->sort($words);
print_r($words);

I am not sure what the locale code is for croatian, you should take a look there. The code is based on a reply to a similar question there.

Community
  • 1
  • 1
Mat
  • 833
  • 1
  • 5
  • 20
  • I think it's shiiped by default with PHP, isn't it? If not I guess it only applies to some versions of it… – Mat Oct 30 '16 at 17:03