2

When I run a script in the terminal containing <?php echo "पीएचपी";, it displays garbage characters instead of the emoji and foreign text.

Specifically, it displays 🚀पीएचपी.

However, running a Node.js script containing console.log("पीएचपी") correctly displays the emoji and the foreign text as पीएचपी.

How can I echo/print emojis and the foreign text properly so they display as intended in the CLI when using PHP?

Any suggestions on how to resolve this and get PHP to display emojis and unicode text correctly in the terminal?

This scenario has been tested using Windows Terminal (Powershell 7), cmd and GitBash(MINGW64) terminal

Running chcp in my windows terminal returns 65001 (which is utf-8). So the terminal itself is configured UTF-8 properly. Reference for chcp: https://learn.microsoft.com/en-us/windows/win32/intl/code-page-identifiers?redirectedfrom=MSDN

Minimal Reproducible Example:

  1. Run chcp 65001 in Windows Terminal/cmd.
  2. Run chcp again to make sure it returns Active code page: 65001.
  3. Run the php script below(make sure extension=mbstring is enabled in php.ini):
<?php

$utf8_string = "पीएचपी";
$detected_encoding = mb_detect_encoding($utf8_string);

echo "Detected encoding[$utf8_string]: " . $detected_encoding;
  1. I still got this displayed:
Detected encoding[🚀पीएचपी]: UTF-8

ADDENDUM: I am using PHP7.0 . It works in PHP 8.2 but not PHP 7.0

Aizzat Suhardi
  • 753
  • 11
  • 19
  • 2
    What character encoding did you save your PHP script in? – CBroe Aug 02 '23 at 11:33
  • 3
    `default_charset` affects how some encoding/ decoding functions work, and would set the charset in the Content-Type response header. You are not using any such functions, and you are not in an HTTP context either - so trying to set this is pretty pointless regarding your issue. – CBroe Aug 02 '23 at 11:35
  • @CBroe I have made sure the PHP script have been saved in UTF-8 format using notepad++ – Aizzat Suhardi Aug 02 '23 at 11:39
  • 1
    Maybe your console is not using UTF-8 encoding (and node somehow automatically compensates for that when writing to console) ...? https://stackoverflow.com/q/5306153/1427878 – CBroe Aug 02 '23 at 11:54
  • 1
    Which OS are you using? – Olivier Aug 02 '23 at 12:06
  • @Olivier Windows 11 – Aizzat Suhardi Aug 02 '23 at 12:17
  • @CBroe I have used Windows Terminal (Powershell 7), cmd and GitBash(MINGW64) terminal. – Aizzat Suhardi Aug 02 '23 at 12:18
  • "Windows Terminal can display Unicode and UTF-8 characters such as emoji and characters from a variety of languages"; as told by the reference. And this question is using Windows Terminal (Powershell 7), cmd and GitBash(MINGW64) terminal. reference: https://learn.microsoft.com/en-us/windows/terminal/ – Aizzat Suhardi Aug 02 '23 at 12:24
  • 2
    You face a [mojibake](https://en.wikipedia.org/wiki/Mojibake) case (*example in Python for its universal intelligibility*): `"पीएचपी".encode( 'utf-8').decode( 'cp437')` returns `≡ƒÜÇαñ¬αÑÇαñÅαñÜαñ¬αÑÇ`. Try `chcp 65001` In `cmd` _before_ running your script… – JosefZ Aug 02 '23 at 22:12
  • 1
    I don't remember the exact details, but Windows terminals in PHP are tricky because PHP does not (did not?) use the Unicode Windows API, Windows does not officially support UTF-8 and most terminal emulators don't implement fallback glyphs. – Álvaro González Aug 03 '23 at 11:00
  • @JosefZ the chcp solution doesn't work – Aizzat Suhardi Aug 04 '23 at 08:09
  • @ÁlvaroGonzález Windows Terminal officially supports UTF-8. the ref link is in previous comment – Aizzat Suhardi Aug 04 '23 at 08:15
  • I meant the operating system itself, Windows Terminal is similar to third-party apps on this regard. – Álvaro González Aug 04 '23 at 08:41
  • 1
    Please [edit] your question to improve your [mcve]. In particular, share (in `cmd`) `chcp&&type YourScript.php&&YourScript.php` . – JosefZ Aug 04 '23 at 17:05
  • @JosefZ i have updated my minimal reproducible example – Aizzat Suhardi Aug 04 '23 at 17:58
  • 1
    Which version of PHP? – Olivier Aug 05 '23 at 07:08
  • @Olivier you have won the bounty. In PHP 7.0.33, it is not working. The problem solved when I tested with PHP 8.2. Well help yourself to give the answer and you will get the bounty. Well legacy PHP have to deal with garbage and mojibakes. – Aizzat Suhardi Aug 05 '23 at 17:58

4 Answers4

2

PHP 7.1 introduced a number of changes related to code pages on Windows (see here for the details). One of those changes is the call to php_win32_cp_cli_setup() in the CLI SAPI. That function ultimately calls the SetConsoleOutputCP() Win32 API to set the code page associated with the console.

The code page is set according to the default_charset PHP option. By default, the value of that option is UTF-8, so the code page is set to 65001:

C:\Users\Olivier>C:\php\php.exe -r "echo sapi_windows_cp_get();"
65001

If I set default_charset = "windows-1252" in php.ini, I get:

C:\Users\Olivier>C:\php\php.exe -r "echo sapi_windows_cp_get();"
1252

You mentioned in a comment that you were using PHP 7.0. With that version, the CLI runs with the default OEM code page, which causes your encoding mismatch.

Olivier
  • 13,283
  • 1
  • 8
  • 24
  • for PHP7.0, you have to use shell_exec like in this answer: https://stackoverflow.com/a/76860842/273743 . But I'll give you the bounty points for helping me out in this question. – Aizzat Suhardi Aug 08 '23 at 15:06
1

The issue you're seeing might be related to the terminal's encoding settings, rather than PHP itself. Your terminal needs to support and be set to use UTF-8 to correctly display the emoji and foreign text. The mb_detect_encoding function is detecting the encoding of the string as UTF-8, which is correct.

To verify that PHP is correctly handling the UTF-8 encoded string, you could write the string to a file and then open that file in a text editor that you know supports UTF-8. If the text displays correctly in the text editor, then PHP is handling the UTF-8 encoding correctly, and the issue is likely with your terminal's settings.

Shila Mosammami
  • 999
  • 6
  • 20
  • To be clear, this doesn't _have_ to use UTF-8. This could be done in any encoding that supports the representation of emojis. Just the bytes that PHP outputs (which will be whatever you saved the source code as in your editor) need to be in the encoding that the terminal expects to receive. They just need to match, that's all. – deceze Aug 04 '23 at 14:19
  • @Shila Mosamm Have you tested this in any Windows terminal? Running `chcp` in Windows Terminal outputs 65001 (which is utf-8). I'll add more context in the original question. – Aizzat Suhardi Aug 04 '23 at 16:33
  • Actually, I don't mind if the output is not written well in terminal. I have always tried to do encoding/decoding at the front-end side. – Shila Mosammami Aug 06 '23 at 08:07
0

It seems that it's not the encoding, but rather the font you are using. While the encoding is correct, the font you use may not have the correct glyphs for the Windows terminal (cmd / powershell).

Do you have the Arabic langauge pack installed? It maybe helpful as well.

Just as a point of reference, my output of your script looks like this: script output

However it is perfectly normal when I copied and pasted it in a browser Detected encoding[पीएचपी]: UTF-8

Sorry can't be more help, I hope this helps pointing you in the right direction.

0
shell_exec(chcp 65001);
echo "Hello, पीएचपी";

You have to definitely run shell_exec(chcp 65001) once before outputting emojis and foreign text. This answer has been tested with PHP7.0 using Windows Terminal and Powershell.

sapi_windows_cp_set as related to sapi_windows_cp_get pointed out by @Oliver is only available PHP >=7.1.

Aizzat Suhardi
  • 753
  • 11
  • 19