I ran into the BOM Unicode character when parsing a CSV file and found this neat solution that solved the problem.
//Remove UTF8 Bom
function remove_utf8_bom($text) {
$bom = pack('H*','EFBBBF');
$text = preg_replace("/^$bom/", '', $text);
return $text;
}
Link: How to remove multiple UTF-8 BOM sequences before "<!DOCTYPE>"?
However, I don't completely understand how this works and was wondering if someone could explain what's happening here.
Some questions that I have:
- Is 'EFBBBF' a HEX representation of the BOM Unicode character?
- What is H*? (I assume this is how we specify the format of the 'EFBBBF' string)
- Is it necessary to convert the 'EFBBBF' to a binary representation?
- When I try to print the $bom variable, it's just an empty string. Why is the BOM invisible?
- How does the preg_replace work with binary characters?