0

I have 1 hex string that I want to pass to a string, but I get strange characters, why?

$string = "52656C6F6A204E616D69209620534B4D4549209620416375E17469636F";
$productnamehex = hex2bin($string);

result:

Reloj Nami � SKMEI � Acu�tico

should show:

Reloj Nami – SKMEI – Acuático

poor with utf8_encode and utf8_decode but nothing seems to work.

PHP TESTER CODE:

<?php
$string = "52656C6F6A204E616D69209620534B4D4549209620416375E17469636F";
$productnamehex = hex2bin($string);
echo $productnamehex;
Thomas Dickey
  • 51,086
  • 7
  • 70
  • 105
Club
  • 111
  • 1
  • 8
  • 3
    What makes you say it "should" show `Reloj Nami – SKMEI – Acuático`? Because the problematic character in your string uses a single byte value `96` surrounded by spaces (hex `20`), which is decimal 150, which in ASCII, ANSI, and UTF-8 is smack dab in the middle of the [C1 controls in the Latin-1 Supplement block](https://en.wikipedia.org/wiki/Latin-1_Supplement) (0x0080-0x009F). There is literally no way for it to be an en-dash, because that has hex code `2013`, which is decimal value 8211. Your code is 100% doing the right thing here =) – Mike 'Pomax' Kamermans Jun 14 '22 at 01:12
  • hello go to http://phptester.net/ and use `$string = "52656C6F6A204E616D69209620534B4D4549209620416375E17469636F"; $productnamehex = hex2bin($string); echo $productnamehex;` shows weird characters – Club Jun 14 '22 at 01:19
  • Yes, they are saying that whatever gave you `52656C6F6A204E616D69209620534B4D4549209620416375E17469636F` gave it wrong. That is not `Reloj Nami – SKMEI – Acuático`. The dashes are malformed – user3783243 Jun 14 '22 at 01:36

1 Answers1

4

Your string is encoded with MS cp1252.

The function utf8_encode() is misleading in that it only partially translates it because it only works with ISO-8859-1, of which cp1252 is a superset that includes additional characters like em-dashes and en-dashes, as in your string.

To properly convert the string:

$hex = "52656C6F6A204E616D69209620534B4D4549209620416375E17469636F";
$cp1252 = hex2bin($hex);
$utf8 = mb_convert_encoding($cp1252, 'UTF-8', 'cp1252');

var_dump($hex, $cp1252, $utf8);

Output:

string(58) "52656C6F6A204E616D69209620534B4D4549209620416375E17469636F"
string(29) "Reloj Nami � SKMEI � Acu�tico"
string(34) "Reloj Nami – SKMEI – Acuático"

See also: UTF-8 all the way through

Be warned that text encoding is rarely ever obvious just by looking at the data, and even the functions that purport to detect the encoding are simply making educated guesses. If it weren't for the dashes it simply wouldn't be possible to know which encoding it was for certain.

Text encoding is important metadata that must be tracked alongside the data itself.

Sammitch
  • 30,782
  • 7
  • 50
  • 77
  • Amazing! Is there any way to replace this problematic character with the "–" to "-" – Club Jun 14 '22 at 01:50
  • `$productnamehex = str_replace("–", "-", $productnamehex); $productnamehex = str_replace("-", "-", $productnamehex); $productnamehex = str_replace("–", "-", $productnamehex);` – Club Jun 14 '22 at 01:51
  • i tried that and it worked you are amazing thank you very much! is there any better way to do it? more optimal and clean? – Club Jun 14 '22 at 01:51
  • I mean replace the "–" – Club Jun 14 '22 at 01:57