Looking down the rabbit hole, it appears that the comments in the documentation for mb_encode_numericentity
are accurate, though somewhat cryptic.
The four major parts to the convmap
appear to be:
start_code
: The map affects items starting from this character code.
end_code
: The map affects items up to this character code.
offset
: Add a specific offset amount (positive or negative) for this character code.
mask
: Value to be used for mask operation (character code bitwise AND mask value).
Character codes can be visualized via character tables such as this Codepage Layout example for ISO-8859-1
encoding. (ISO-8859-1
is the encoding used in the original PHP documentation Example #2.) Looking at this encoding table, we can see that the convmap
is only meant to affect character code items that start from 0x80
(which appears to be blank for this particular encoding) to the final character in this encoding 0xff
(which appears to be ÿ
).
In order to better understand the offset and mask features of convmap
, here are some examples of how offset and mask affect character codes (and in the examples below, our character code
has a defined value of 162
):
Plain Example:
<?php
$original_str = "¢";
$convmap = array(0x00, 0xff, 0, 0xff);
$converted_str = mb_encode_numericentity($original_str, $convmap, "UTF-8");
echo "original: $original_str\n";
echo "converted: $converted_str\n";
?>
Result:
original: ¢
converted: ¢
Offset Example:
<?php
$original_str = "¢";
$convmap = array(0x00, 0xff, 1, 0xff);
$converted_str = mb_encode_numericentity($original_str, $convmap, "UTF-8");
echo "original: $original_str\n";
echo "converted: $converted_str\n";
?>
Result:
original: ¢
converted: £
Notes:
The offset
seems to allow for a finer grain of control for the current start_code
and end_code
section of items-to-convert. For example, you might have some particular reason you need to add an offset for a certain line of character codes in your convmap
, but then you might need to ignore that offset for another line in your convmap
.
Mask Example:
<?php
// Mask Example 1
$original_str = "¢";
$convmap = array(0x00, 0xff, 0, 0xf0);
$converted_str = mb_encode_numericentity($original_str, $convmap, "UTF-8");
echo "original: $original_str\n";
echo "converted: $converted_str\n\n";
// Mask Example 2
$convmap = array(0x00, 0xff, 0, 0x0f);
$converted_str = mb_encode_numericentity($original_str, $convmap, "UTF-8");
echo "original: $original_str\n";
echo "converted: $converted_str\n\n";
// Mask Example 3
$convmap = array(0x00, 0xff, 0, 0x00);
$converted_str = mb_encode_numericentity($original_str, $convmap, "UTF-8");
echo "original: $original_str\n";
echo "converted: $converted_str\n";
?>
Result:
original: ¢
converted:  
original: ¢
converted: 
original: ¢
converted: �
Notes:
This answer does not intend to cover masking in great detail, but masking can help keep or remove certain bits from a given value.
Mask Example 1
So in the first mask example 0xf0
, the f
indicates that we want to keep the values on the left side of the binary value. Here, f
has a binary value of 1111
and 0
has a binary value of 0000
--together becoming a value of 11110000
.
Then, when we do a bitwise AND operation with our character code
(in this case, 162
, which has a binary value of 10100010
) the bitwise operation looks like this:
11110000
& 10100010
----------
10100000
And when converted back to its decimal value, 10100000
is 160
.
Therefore, we've effectively kept the "left side" of the bits from the original character code
value, and have gotten rid of the "right side" of the bits.
Mask Example 2
In the second mask example, the mask 0x0f
(which has a binary value of 00001111
) in the bitwise AND operation would have the following binary result:
00001111
& 10100010
----------
00000010
Which, when converted back to its decimal value, is 2
.
Therefore, we've effectively kept the "right side" of the bits from the original character code
value, and have gotten rid of the "left side" of the bits.
Mask Example 3
Finally, the third mask example shows what happens when using a mask of 0x00
(which is 00000000
in binary) in the bitwise AND operation:
00000000
& 10100010
----------
00000000
Which results in 0
.