3

So I am playing with this tool:

http://www.unit-conversion.info/texttools/ascii/

When I try this character:

'

I see the value 039 which can be verified from: http://www.asciitable.com

But I am curios about:

This character in the same tool will return: 226 128 153

But as far as I know ASCII is 8 bits (or even 7...)

What is 226 128 153 in here?

Koray Tugay
  • 22,894
  • 45
  • 188
  • 319

4 Answers4

6

The character you have is U+2019 RIGHT SINGLE QUOTATION MARK, which is also the typographically correct way of representing the apostrophe in most positions.

What the site does, is representing the characters in UTF-8. As you can see in the page I linked, this character is encoded as three bytes, 0xE2 0x80 0x99 in hexadecimal, or 226 128 153 in decimal.

The reason that that page uses UTF-8 instead of ASCII? Simple. First, ASCII is a subset of UTF-8. Second, UTF-8 supports the entire Unicode. So there's rarely a reason to use ASCII if UTF-8 can be used instead.

Karol S
  • 9,028
  • 2
  • 32
  • 45
2

I have this same issue (trying to actually convert a string to uppercase, ran into this character and it 'broke' a bunch of methods of converting a string with special characters to uppercase.

I used this solution:

    $text = preg_replace("/[`‛′’‘]/u", "'", $text);

(NOT MINE - taken from here: https://stackoverflow.com/a/24925209/6136613)

This then converts it to a regular comma, and you can perform normal php functions on it.

Community
  • 1
  • 1
Scott T
  • 31
  • 3
1

The first character is ASCII, code 39. The second is UNICODE character, code 8217.

See UNICODE character table, specifically for this character.

For more information read the UNICODE article.

$(document).ready(function(){
  $('#res').html("’".charCodeAt(0));
})
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id='res'><div>
Alexander Dayan
  • 2,846
  • 1
  • 17
  • 30
0

it seems that that is the UTF16 representation. probably that website is converting the characters to their code representation with "’".charCodeAt(0); in Javascript