2

I have read some other threads on this subject but I cannot understand what I am doing wrong.

I have a function

public function reEncode($item)
{
    if (! mb_detect_encoding($item, 'utf-8', true)) {
        $item = utf8_encode($item);
    }

    return $item;
}

I am writing a test for this. I want to test a string that is not UTF-8 to see if this statement is hit. I am having trouble creating the test string.

$contents = file_get_contents('CyrillicKOI8REncoded.txt');
var_dump(mb_detect_encoding($contents));

$sanitized = $this->reEncode($contents);
var_dump(mb_detect_encoding($sanitized));

Initially I used file_get_contents on a file I encoded in sublime with various encodings; Cyrillic (KOI8-R), HEX and DOS (CP 437) as it has been stated that file_get_contents() ignores the file encoding. This seems to be true as the characters returned are a jumbled mess.

That said, every time I use mb_detect_encoding() on these variables, I always get ASCII or UTF-8. The statement is never triggered because ASCII is a subset of UTF-8.

So I have tried mb_convert_encoding() and iconv() to convert a basic string to UTF-16, UTF-32, base64, hex etc etc but every time mb_detect_encoding() returns ASCII or UTF-8

In my tests I want to assert the encoding type before and after this function is called.

$sanitized = $this->reEncode($contents);

$this->assertEquals('UTF-32', mb_detect_encoding($contents));
$this->assertEquals('UTF-8', mb_detect_encoding($sanitized));

I cannot understand what basic mistake I am doing to constantly get ASCII or UTF-8 returned from mb_detect_encoding().

Community
  • 1
  • 1
myol
  • 8,857
  • 19
  • 82
  • 143
  • Just note didnt read your post fully: Keep in mind that the encoding of the php-file, of the text-file, the internal setting of PHP, DatabaseConnection setup (if used) and witch and how you run functions, can change the behavior. Beware of UTF-X ;-) im out – JustOnUnderMillions Feb 01 '17 at 17:09

1 Answers1

1

Ok, so it turns out you must use strict to check or the mb_detect_encoding() function is next to useless.

$item = mb_convert_encoding('Котёнок', 'KOI8-R');

$sanitized = $this->reEncode($item);

$this->assertEquals('KOI8-R', mb_detect_encoding($item, 'KOI8-R', true));
$this->assertEquals('UTF-8', mb_detect_encoding($sanitised, 'UTF-8', true));
myol
  • 8,857
  • 19
  • 82
  • 143