0

My boss is forcing me to use an access mdb database (yes, I'm serious) in a php server. I can connect it and retrieve data from it, but as you could imagine, I have problems with encodings because I want to work using utf8.

The thing is that now I have two "solutions" to translate Windows-1252 to UTF-8

This is the first way:

mb_convert_encoding($string, "UTF-8", "Windows-1252").

It works, but the problem is that special chars are not properly converted, for example char º is converted to \u00ba and char Ó is converted to \u00d3.

My second way is doing this:

mb_convert_encoding(mb_convert_encoding($string, "UTF-8", "Windows-1252"), "HTML-ENTITIES", "UTF-8")

It works too, but it happens the same, special chars are not correctly converted. Char º is converted to º

Does anybody know how to properly change encoding including special chars? Or does anybody know how to convert from º and \u00ba to something readable?

Blue
  • 22,608
  • 7
  • 62
  • 92
JFValdes
  • 426
  • 8
  • 19
  • 1
    Check out the related answer [here](http://stackoverflow.com/a/28341697/2144390). However, bear in mind that there will always be *some* compatibility issues using PHP+Access+Unicode (e.g., arbitrary Unicode parameter values in SQL queries won't work), so your boss may want to re-think his/her edict re: using that combination of technologies. – Gord Thompson Jul 09 '16 at 17:12

2 Answers2

0

I did simple test to convert codepoint to letters

<?php
function codepoint_decode($str) {
    return json_decode(sprintf('"%s"', $str));
}

$string_with_codepoint = "Ahed \u00d3\u00ba\u00d3";
// $string_with_codepoint = mb_convert_encoding($string, "UTF-8", "Windows-1252");
$output = codepoint_decode($string_with_codepoint);
echo $output; // Ahed ÓºÓ

Credit go for this answer

Community
  • 1
  • 1
Ahed Eid
  • 395
  • 4
  • 17
0

I finally found the solution. I had the solution from the beginning but I was doing my tests wrong.

My bad.

The right way to do it for me is mb_convert_encoding($string, "UTF-8", "Windows-1252")

But i was checking the result like this:

$stringUTF8 = mb_convert_encoding($string, "UTF-8", "Windows-1252");
echo json_encode($stringUTF8);

that's why it was returning unicode chars like \u20ac, if I would have done:

$stringUTF8 = mb_convert_encoding($string, "UTF-8", "Windows-1252");
echo $stringUTF8;

I should have seen the solution from the beginning but I was wrong. It was json_encode() what was turning special chars into unicode chars.

Thanks everybody for your help!!

JFValdes
  • 426
  • 8
  • 19