38

How to convert ASCII encoding to UTF8 in PHP

mdrg
  • 3,242
  • 2
  • 22
  • 44
user614856
  • 399
  • 1
  • 3
  • 3

5 Answers5

53

ASCII is a subset of UTF-8, so if a document is ASCII then it is already UTF-8.

Quentin
  • 914,110
  • 126
  • 1,211
  • 1,335
  • 1
    Word of caution, if the ASCII is "extended" ascii, then you may encounter issues. https://en.wikipedia.org/wiki/Extended_ASCII – Azeroth2b Mar 29 '17 at 14:26
32

If you know for sure that your current encoding is pure ASCII, then you don't have to do anything because ASCII is already a valid UTF-8.

But if you still want to convert, just to be sure that its UTF-8, then you can use iconv

$string = iconv('ASCII', 'UTF-8//IGNORE', $string);

The IGNORE will discard any invalid characters just in case some were not valid ASCII.

Dwza
  • 6,494
  • 6
  • 41
  • 73
Dmitri
  • 34,780
  • 9
  • 39
  • 55
7

Use mb_convert_encoding to convert an ASCII to UTF-8. More info here

$string = "chárêctërs";
print(mb_detect_encoding ($string));

$string = mb_convert_encoding($string, "UTF-8");
print(mb_detect_encoding ($string));
albertoiNET
  • 1,280
  • 25
  • 34
  • 1
    Only one solution which working for me, thanks a lot for this! – Acuna Feb 29 '20 at 02:20
  • 1
    This answer is basically wrong. [mb_detect_encoding ()](https://php.net/mb_detect_encoding) is both poorly named and poorly documented. All it does is looping through a system-dependent and typically short list of encodings (in my system, only `ASCII` and `UTF-8`) and returning the first one where your all bytes have something assigned. Detecting text encoding programmatically in a reliable way is as hard as detecting whether a picture has a cat. – Álvaro González May 02 '20 at 12:08
4

"ASCII is a subset of UTF-8, so..." - so UTF-8 is a set? :)

In other words: any string build with code points from x00 to x7F has indistinguishable representations (byte sequences) in ASCII and UTF-8. Converting such string is pointless.

Radek M
  • 379
  • 2
  • 8
  • 1
    Key phrase here is the "code points from x00 to x7F". If your "ASCII" has code points from x10 to xFF, then you need to do more work. – Azeroth2b Mar 29 '17 at 14:29
1

Use utf8_encode()

Man page can be found here http://php.net/manual/en/function.utf8-encode.php

Also read this article from Joel on Software. It provides an excellent explanation if what Unicode is and how it works. http://www.joelonsoftware.com/articles/Unicode.html

Andrii Abramov
  • 10,019
  • 9
  • 74
  • 96
thomas
  • 949
  • 6
  • 20
  • 11
    utf8_encode was designed to encode latin-1 into utf-8. Only for latin-1 (which is ISO-8859-1). – Dmitri Feb 13 '11 at 14:50
  • This answer is wrong. As per docs, "Encodes an **ISO-8859-1** string to UTF-8". ISO-8859-1 is not ASCII just like Spanish is not Latin. – Álvaro González May 02 '20 at 12:02