can I get the unicode value of a character or vise versa with php?

Question

Is it possible to input a character and get the unicode value back? for example, i can put &#12103 in html to output "⽇", is it possible to give that character as an argument to a function and get the number as an output without building a unicode table?

$val = someFunction("⽇");//returns 12103

or the reverse?

$val2 = someOtherFunction(12103);//returns "⽇"

I would like to be able to output the actual characters to the page not the codes, and I would also like to be able to get the code from the character if possible. The closest I got to what I want is php.net/manual/en/function.mb-decode-numericentity.php but I cant get it working, is this the code I need or am I on the wrong track?

score 39 · Accepted Answer · edited Oct 17 '21 at 16:47

39

function _uniord($c) {
    if (ord($c[0]) >=0 && ord($c[0]) <= 127)
        return ord($c[0]);
    if (ord($c[0]) >= 192 && ord($c[0]) <= 223)
        return (ord($c[0])-192)*64 + (ord($c[1])-128);
    if (ord($c[0]) >= 224 && ord($c[0]) <= 239)
        return (ord($c[0])-224)*4096 + (ord($c[1])-128)*64 + (ord($c[2])-128);
    if (ord($c[0]) >= 240 && ord($c[0]) <= 247)
        return (ord($c[0])-240)*262144 + (ord($c[1])-128)*4096 + (ord($c[2])-128)*64 + (ord($c[3])-128);
    if (ord($c[0]) >= 248 && ord($c[0]) <= 251)
        return (ord($c[0])-248)*16777216 + (ord($c[1])-128)*262144 + (ord($c[2])-128)*4096 + (ord($c[3])-128)*64 + (ord($c[4])-128);
    if (ord($c[0]) >= 252 && ord($c[0]) <= 253)
        return (ord($c[0])-252)*1073741824 + (ord($c[1])-128)*16777216 + (ord($c[2])-128)*262144 + (ord($c[3])-128)*4096 + (ord($c[4])-128)*64 + (ord($c[5])-128);
    if (ord($c[0]) >= 254 && ord($c[0]) <= 255)    //  error
        return FALSE;
    return 0;
}   //  function _uniord()

and

function _unichr($o) {
    if (function_exists('mb_convert_encoding')) {
        return mb_convert_encoding('&#'.intval($o).';', 'UTF-8', 'HTML-ENTITIES');
    } else {
        return chr(intval($o));
    }
}   // function _unichr()

edited Oct 17 '21 at 16:47

gturri

13,807
9
40
57

answered Feb 20 '12 at 12:59

Mark Baker

209,507
32
346
385

Hi Mark, Thanks for the code. Is this from somewhere online with an explanation on how it works? – Totoro Feb 20 '12 at 13:18
It's code I use in PHPExcel; but I can't recall where I got it from now, or find a reference to its source... but it's used in a number of libraries – Mark Baker Feb 20 '12 at 13:31
1

The first function takes a string (a Unicode character consists of several octets), checks the first bits of the first octet to find out the length of the character in octets (I think it's using UTF8). Then strips the control bits from every octet, and turns the remaining bits (those forming the unicode character itself) into the number you want. That conversion is straightforward, just turning the integer to string. – Sebastián Grignoli Feb 20 '12 at 13:36
You are a lifesaver!! Thank you! – Sangar82 Apr 10 '18 at 07:48

score 26 · Answer 2 · answered Dec 12 '14 at 12:56

26

Here's a more compact implementation of unichr/uniord based on pack:

// code point to UTF-8 string
function unichr($i) {
    return iconv('UCS-4LE', 'UTF-8', pack('V', $i));
}

// UTF-8 string to code point
function uniord($s) {
    return unpack('V', iconv('UTF-8', 'UCS-4LE', $s))[1];
}

answered Dec 12 '14 at 12:56

bobince

528,062
107
651
834

Jail Breaking... :D – kupendra Jun 22 '16 at 14:23
I think it should have been UTF-32LE, not UCS*. But the trick is definitely creative. – AnrDaemon Oct 18 '18 at 23:20
This just saved my life! – CodiMech25 Feb 12 '19 at 09:51

MAChitgarha · Answer 3 · 2019-09-11T06:45:28.737

20

If you're using PHP7.2 (or later), you don't need to define a new function. There are two functions for your purposes from Multibyte String extension!

To get code point of a character (i.e. Unicode value), use mb_ord(); and to get a specific character from that value, use mb_chr().

E.g.:

mb_chr(12103, "utf8"); // ⽇
mb_ord("⽇", "utf8"); // 12103

edited Sep 11 '19 at 06:45

answered Mar 04 '18 at 16:56

MAChitgarha

3,728
2
33
40

user23127 · Answer 4 · 2014-05-05T10:16:50.417

10

This also works, (for someone who understands bitshifting this might be more readable than Mark Bakers answer):

public function ordinal($str){
    $charString = mb_substr($str, 0, 1, 'utf-8');
    $size = strlen($charString);        
    $ordinal = ord($charString[0]) & (0xFF >> $size);
    //Merge other characters into the value
    for($i = 1; $i < $size; $i++){
        $ordinal = $ordinal << 6 | (ord($charString[$i]) & 127);
    }
    return $ordinal;
}

edited May 05 '14 at 10:16

answered May 04 '14 at 14:38

user23127

827
10
21

Hello, I tested your answer vs Marks and I think there is an issue with yours (because I am not good with bit shifting I dont know what). echo "
".ordinal("響")." :: "._uniord("響")."
"; Returns: 105 :: 38911 (it should be 38911) – Totoro May 05 '14 at 09:34
Hello, thank you for the response. The error seems to be in the default encoding mb_internal_encoding(), if that is not 'utf-8' retrieving the first character fails. I have fixed this by explicitly adding the encoding to mb_substr. – user23127 May 05 '14 at 10:18
I up voted as it works now, but will leave the answer as it was. Thanks for the alternative – Totoro May 05 '14 at 14:35
Sure, I don't really answer for karma :P. – user23127 May 05 '14 at 14:42

Akhil Thayyil · Answer 5 · 2012-02-20T13:28:28.427

3

You can use the following functions

For encoding

string utf8_encode ( string $data )

http://php.net/manual/en/function.utf8-encode.php

For decoding

string utf8_decode ( string $data )

http://php.net/manual/en/function.utf8-decode.php

Also check

http://php.net/manual/en/function.htmlspecialchars.php

<?php


echo htmlspecialchars_decode("&#12103");//will print ⽇

?>

edited Feb 20 '12 at 13:28

answered Feb 20 '12 at 13:02

Akhil Thayyil

9,263
6
34
48

1

hello Akhil, I have looked at these but they only work with the ascii range characters, anything above that becomes gibberish. – Totoro Feb 20 '12 at 13:08
hello @Akhil, thanks, this works, shame there is no encode option. – Totoro Feb 20 '12 at 13:58
UTF-8 is a Unicode encoding, not Unicode. utf8_decode does not give me the unicode value of the character I pass it (what the question asked for). The question asked about `12103` specifically, where `utf8_encode` and `utf8_decode` both return the same number(/string) that it was passed instead of a unicode character. – Kissaki Jan 15 '16 at 21:11

can I get the unicode value of a character or vise versa with php?

5 Answers5

Linked

Related