15

I'm trying to sort an array first by it's values and then by it's keys but php is not doing well with Persian characters.
Persian alphabets are similar to Arabic alphabets except some additional characters like 'گ چ پ ژ ک' and PHP is doing great at sorting Arabic letters in Persian Alphabets but the rest is not in their order.

For example

$str = 'ا ب پ ت ث ج چ ح خ د ذ ر ز ژ ص ض ط ظ ع غ ف ق ک گ ل م ن و ه ی';
$arr = explode(' ', $str);

will create an array ($arr) containing all Persian alphabets in correct alphabetical order. and if I shuffle it and use asort function like following:

shuffle($arr);
asort($arr);
var_dump($arr);

it will end as something like this:

    array
        2 => string 'ا'
        1 => string 'ب'
        22 => string 'ت'
        29 => string 'ث'
        20 => string 'ج'
        12 => string 'ح'
        21 => string 'خ'
        18 => string 'د'
        6 => string 'ذ'
        3 => string 'ر'
        27 => string 'ز'
        17 => string 'ص'
        11 => string 'ض'
        25 => string 'ط'
        5 => string 'ظ'
        16 => string 'ع'
        8 => string 'غ'
        26 => string 'ف'
        14 => string 'ق'
        9 => string 'ل'
        0 => string 'م'
        7 => string 'ن'
        10 => string 'ه'
        28 => string 'و'
        24 => string 'پ'
        23 => string 'چ'
        13 => string 'ژ'
        19 => string 'ک'
        4 => string 'گ'
        15 => string 'ی'

which is wrong!

24th item should be after 1st, 23rd should be after 20 and so on.

How can I write a functions doing something similar to PHP's own sorting functions? Or maybe there's a way to make PHP functions work for persian characters?

Farid Rn
  • 3,167
  • 5
  • 39
  • 66
  • 1
    Use the appropriate sort() function with the sort_flag set to SORT_LOCALE_STRING having used setlocale() – Mark Baker Apr 01 '14 at 20:48
  • 1
    @MarkBaker I did `setlocale(LC_ALL, 'fa_IR'); asort($arr, SORT_LOCALE_STRING);`. but it's not working; am I doing it wrong? – Farid Rn Apr 01 '14 at 20:54
  • Does your server support that locale? Does the `setlocale(LC_ALL, 'fa_IR');` return a Boolean false? – Mark Baker Apr 01 '14 at 20:55
  • @MarkBaker I thought `fa_IR` could be locale of farsi/persian but it's returning false on both Windows and Linux environment. – Farid Rn Apr 01 '14 at 20:58
  • @MarkBaker I can't find proper locale for Perian, I can see `fa_IR` in answers of [this thread](http://stackoverflow.com/questions/3191664/list-of-all-locales-and-their-short-codes) but that's not working! – Farid Rn Apr 01 '14 at 21:02
  • http://docs.moodle.org/dev/Table_of_locales may help – Mark Baker Apr 01 '14 at 21:04
  • @MarkBaker Thanks to you and the moodle documentations, After generating all locales on my linux server and doing as you mentioned in your comments, my problem seems to be solved. Can you write and describe it as an answer so I can mark it as correct answer? – Farid Rn Apr 02 '14 at 13:33

6 Answers6

5

I’ve written the following function to return the UTF-8 code point for any given character:

function utf8_ord($str) {
    $str = (string) $str;
    $ord = ord($str);
    $ord_b = decbin($ord);

    if (strlen($ord_b) <= 7) 
      return $ord;
    $len = strlen(strstr($ord_b, "0", true));

    if ($len < 2 || $len > 4 || strlen($str) < $len) 
      return false;
    $val = substr($ord_b, $len + 1);

    for ($i = 1; $i < $len; $i++) {
        $ord_b = decbin(ord($str[$i]));
        if ($ord_b[0].$ord_b[1] != "10") 
          return false;
        $val. = substr($ord_b, 2);
    }
    $val = bindec($val);
    return (($val > 0x10FFFF) ? null : $val);
}

Now let’s find out the UTF-8 code points of the characters in your array:

$str = 'ا ب پ ت ث ج چ ح خ د ذ ر ز ژ ص ض ط ظ ع غ ف ق ک گ ل م ن و ه ی';
$arr = explode(' ', $str);
print_r(array_map("utf8_ord", $arr));

The output will be:

Array
(
    [0] => 1575
    [1] => 1576
    [2] => 1662
    [3] => 1578
    [4] => 1579
    [5] => 1580
    [6] => 1670
    [7] => 1581
    [8] => 1582
    [9] => 1583
    [10] => 1584
    [11] => 1585
    [12] => 1586
    [13] => 1688
    [14] => 1589
    [15] => 1590
    [16] => 1591
    [17] => 1592
    [18] => 1593
    [19] => 1594
    [20] => 1601
    [21] => 1602
    [22] => 1705
    [23] => 1711
    [24] => 1604
    [25] => 1605
    [26] => 1606
    [27] => 1608
    [28] => 1607
    [29] => 1740
)

It clearly shows that the characters are not in proper order and needs to be sorted. I don’t know Persian, so I’m unable to determine whether or not there’s a fault in the UTF-8 Persian alphabet. But all I can say is that PHP is doing its work correctly.

Mohammad
  • 21,175
  • 15
  • 55
  • 84
Sharanya Dutta
  • 3,981
  • 2
  • 17
  • 27
5

well to get the available locales you can use

print_r(ResourceBundle::getLocales(''));

I had both 'fa' and 'fa_IR' available, however 'fa_IR' was still returning false so I used 'fa' to test it:

setlocale(LC_ALL, 'fa');
asort($arr, SORT_LOCALE_STRING);
var_dump($arr);

but this was not still sorting in the proper order for me...

so after abit of more googling, the solution that has finally worked for me to sort Unicode Persian alphabets was using the Collator class:

$col = new \Collator('fa_IR');
$col->asort($arr);
var_dump($arr);

I know the question is old but this might still be helping the new people getting here looking for an answer to this question.

Saeid
  • 2,704
  • 5
  • 20
  • 20
  • As you can see, I haven't marked any answer as accepted because none of the solutions worked for me. Your workaround seems promising, I'll give it a try. Thanks a lot. – Farid Rn Oct 16 '17 at 16:01
5

To sort an array by Persian characters, first note that some characters in the Unicode standard are not correctly aligned. In this situation, my suggestion is to create a regular array of Persian characters and arrange the subject array according to this array. for example:

function persianSort($item1, $item2){
    $persian_characters = [
        1 =>  'ا',
        2 =>  'ب',
        3 =>  'پ',
        4 =>  'ت',
        5 =>  'ث',
        6 =>  'ج',
        7 =>  'چ',
        8 =>  'ح',
        9 =>  'خ',
        10 =>  'د',
        11 =>  'ذ',
        12 =>  'ر',
        13 =>  'ز',
        14 =>  'ژ',
        15 => 'س',
        16 => 'ش',
        17 =>  'ص',
        18 =>  'ض',
        19 =>  'ط',
        20 =>  'ظ',
        21 =>  'ع',
        22 =>  'غ',
        23 =>  'ف',
        24 =>  'ق',
        25 =>  'ک',
        26 =>  'گ',
        27 =>  'ل',
        28 =>  'م',
        29 =>  'ن',
        30 =>  'و',
        31 =>  'ه',
        32 =>  'ی',
    ];

    if(substr($item1,0,2) == substr($item2,0,2))
        return persianSort(substr($item1,2), substr($item2,2));
    return array_search( substr($item1,0,2), $persian_characters) < array_search( substr($item2,0,2), $persian_characters) ? -1: 1;
}

$states = ['گیلان', 'گرگان', 'یزد', 'سمنان', 'تهران', 'اردبیل', 'کرمان', 'چهار محال بختیاری', 'مشهد', 'اصفهان', 'قم', 'آستارا'];

usort($states, "persianSort");

print_r($states);

The code above sorts an unordered array of Iranian province names. the output of the above code is as follows:

Array
(
    [0] => اردبیل
    [1] => اصفهان
    [2] => تهران
    [3] => چهار محال بختیاری
    [4] => سمنان
    [5] => قم
    [6] => کرمان
    [7] => گرگان
    [8] => گیلان
    [9] => مشهد
    [10] => یزد
)
Yusef Shiri
  • 51
  • 1
  • 3
0

I create a custom javascript sort function for Persian arrays:

var alphabets = ["ا", "ب", "پ", "ت", "ث", "ج", "چ", "ح", "خ", "د",

      "ذ", "ر", "ز", "ژ", "س", "ش", "ص", "ض", "ط", "ظ", "ع", "غ",

      "ف", "ق", "ک", "گ", "ل", "م", "ن", "و", "ه", "ی"];

  function PersianOrder(){

      var persianArrray = ["ایمان", "محمدرضا", "ژوله", "چمدان", "پدرام", "پاشی","پاشا"];

      persianArrray.sort(function (a, b) {

          return CharCompare(a, b, 0);

      });

  }

  function CharCompare(a, b, index) {

      if (index == a.length || index == b.length)

          return 0;

      var aChar = alphabets.indexOf(a.charAt(index));

      var bChar = alphabets.indexOf(b.charAt(index));

      if (aChar != bChar)

          return aChar - bChar

      else

          return CharCompare(a,b,index+1)

  }

Check Online

I'll hope that this function help you

Iman Bahrampour
  • 6,180
  • 2
  • 41
  • 64
0

This is easy way for sorting every type arrays by Persian alphabet:

It's important to use setlocale(LC_ALL, 'fa_IR') and SORT_LOCALE_STRING in your code.

example of Multidimensianal Array:

$arr = [
    [
        "name"=> "پژمان",
        "family"=> "رضایی",
        "age"=> "20",
    ],
    [
        "name"=> "بابک",
        "family"=> "قاسمی",
        "age"=> "25",
    ],
    [
        "name"=> "محمد",
        "family"=> "حسینی",
        "age"=> "19",
    ],
    [
        "name"=> "هاشم",
        "family"=> "مقدم",
        "age"=> "28",
    ],
    [
        "name"=> "آفرینش",
        "family"=> "دلربا",
        "age"=> "18",
    ],
    [
        "name"=> "مونا",
        "family"=> "محمدی",
        "age"=> "26",
    ],

];

setlocale(LC_ALL, 'fa_IR');
$faColumns = array_column($arr, 'name');
array_multisort($faColumns, SORT_ASC, SORT_LOCALE_STRING , $arr);

print_r($arr);

example of Indexed Array

$arr1 = ["پژمان","بابک","محمد","هاشم","آفرینش","مونا"];

setlocale(LC_ALL, 'fa_IR');
asort($arr1, SORT_LOCALE_STRING);

print_r($arr1);
0

as i found out the utf8 table is not sorted like the order of persian alphabet so php must sort persian word as they must sort

<?php 
function persianCmp($a, $b) {
    $per = [
        "آ", "ا", "ب", "پ",
        "ت", "ث", "ج", "چ",
        "ح", "خ", "د", "ذ",
        "ر", "ز", "ژ", "س",
        "ش", "ص", "ض", "ط",
        "ظ", "ع", "غ", "ف",
        "ق", "ک", "گ", "ل",
        "م", "ن", "و", "ه",
        "ی"
    ];    
    $per= array_flip($per);
    $a = $per[mb_strcut($a,0,2)];
    $b = $per[mb_strcut($b,0,2)];
    //echo "\na:".$a." b:".$b;
    if ($a == $b) {
        return 0;
    }
    return ($a < $b) ? -1 : 1;
}
$persianWords = [ "هدیه","حمیدرضا","غلامی","بروجنی","آیه","چرم","بسته"];
var_dump($persianWords);
usort($persianWords, "persianCmp");
var_dump($persianWords);
/*array(7) {
  [0]=>string(8) "هدیه"
  [1]=>string(14) "حمیدرضا"
  [2]=>string(10) "غلامی"
  [3]=>  string(12) "بروجنی"
  [4]=>  string(6) "آیه"
  [5]=>  string(6) "چرم"
  [6]=>  string(8) "بسته"
}
array(7) {
  [0]=>  string(6) "آیه"
  [1]=>  string(12) "بروجنی"
  [2]=>  string(8) "بسته"
  [3]=>  string(6) "چرم"
  [4]=>  string(14) "حمیدرضا"
  [5]=>  string(10) "غلامی"
  [6]=>  string(8) "هدیه"
*/