Is there a way to determine for sure the minimum number of bytes required by a character in a specific encoding? Like one of the encodings supported by the mbstring extension. The value will be 1 for UTF-8, 2 for UTF-16, etc.
I don't want to obtain the length of a particular string or char.
I want to know the minimum char size supported by a given encoding, according to it's specification.
I currently use this code:
<?php
function flawed_detection($encoding)
{
// I use 'a' in the hope that this char need the least number of bytes in all the supported encodings
return strlen(mb_convert_encoding('a', $encoding, 'UTF-8'));
}
foreach (mb_list_encodings() as $encoding) {
echo "$encoding: ", flawed_detection($encoding), "\n";
}
Partial output:
...
UTF-16LE: 2
UTF-8: 1
UTF-7: 1
UTF7-IMAP: 1
ASCII: 1
EUC-JP: 1
...
But I'm not sure of the "correct" character to use. If ever there is one.
edit: I've tested the brute-force approach with every chars from 0 to U+10FFFF in every encodings, and the results are exactly the same that with my finally_not_so_flawed_detection function (with the 'a' char or with space) :p