2

i'm using php-gd to create some text but there is a problem with text encoding and direction i use arabic language which is rtl language and also there will be extra english phrases in the same image.

Problem :

imagettftext($image, 18, 0, 317, 141,$font_color, 'breeco.ttf', $Arabic->utf8Glyphs($friends[0]['name']));

if the text is in english "ltr" it will have x position of 317 and it is correct but when it is in rtl it will have the same 317 x position and it is not correct

is there anyway to detect rtl of string ?

Ahmed Atef
  • 55
  • 1
  • 8
  • Is `$friends[0]['name']` already in UTF-8? If so, does it carry with it LTR and RTL markers? If not, probably all you could do is look at the content of the string, character-by-character, and determine if any/all of it falls within the ranges of RTL languages. – Phil Perry Dec 31 '13 at 18:25
  • @Phil yes in utf-8 and yes it carry ltr and rtl and its variable once rtl and once more ltr – Ahmed Atef Dec 31 '13 at 18:34

2 Answers2

7

This is actually more tricky than it should be. Each Unicode character has information which tells us if it is a RTL or LTR character, but I don't see a way of reading this information in PHP - instead you need to look up this information in a table of the Unicode characters.

I've put together a rather inefficient solution below, but I would suggest looking at this PHP implementation of Stringprep if you need something more robust. This library will also check the validity of the strings, e.g. it can enforce rules such as "no a mix of RTL and LTR chars in the same string". However, it is designed for preparing strings for use in internet protocols, rather than standard text, so the restrictions it imposes might get in the way of simply using it to check the text direction.

Thanks to this StackOverflow answer for information about where to get the Unicode data and how to interpret.

First we can create a file which has just the characters with the bidirectional properties called "R" or "AL" (RandALCat), this is stored in the 5th field of the Unicode data. This command grabs the data from that URL, removes characters which do not have AL or R in the 5th field, pads the restultant hex codes to 6 characters and saves it in a file called RandALCat.txt.

curl http://www.unicode.org/Public/6.0.0/ucd/UnicodeData.txt |  \
    egrep -e "([^;]*;){4}(AL|R);.*" | \
    awk -F";" '{ printf("%06s\n", $1) }' > RandALCat.txt

We can then use this file in a function which tests each character in a string against it:

<?php

function isRTL($testString) {

    $RandALCat = file('RandALCat.txt', FILE_IGNORE_NEW_LINES);
    $codePoints = unpack('V*', iconv('UTF-8', 'UTF-32LE', $testString));

    foreach ($codePoints as $codePoint) {
        $hexCode = strtoupper(str_pad(dechex($codePoint), 6, '0', STR_PAD_LEFT));
        if (array_search($hexCode, $RandALCat)) {
            return true;
        }
    }

    return false;

}

$englishText = 'Hello';
$arabicText = 'السلام عليكم';

var_dump(isRTL($englishText));
var_dump(isRTL($arabicText));

If you save this as test.php or something then run it, you should see this output:

$ php -q test.php
bool(false)
bool(true)
Community
  • 1
  • 1
madebydavid
  • 6,457
  • 2
  • 21
  • 29
5

You can use the following regular expression,

$rtlChar = '/[\x{0590}-\x{083F}]|[\x{08A0}-\x{08FF}]|[\x{FB1D}-\x{FDFF}]|[\x{FE70}-\x{FEFF}]/u';

I borrowed the Java Script version of it from one of the Twitter libraries. So your function would look like,

function isRtl($value) {
    $rtlChar = '/[\x{0590}-\x{083F}]|[\x{08A0}-\x{08FF}]|[\x{FB1D}-\x{FDFF}]|[\x{FE70}-\x{FEFF}]/u';
    return preg_match($rtlChar, $value) != 0;
}
maqduni
  • 459
  • 7
  • 8