Using the sample image from https://www.iptc.org/std-dev/photometadata/examples/google-licensable/example-page1.html and the following code:
getimagesize("sample.jpg", $image_info);
if (isset($image_info["APP13"])) {
$iptc = iptcparse($image_info["APP13"]);
var_dump($iptc);
}
In the browser, the output shows this: � Copyright 2020 IPTC (Test Images) - www.iptc.org
That first character is supposed to be the copyright symbol. How do ensure that the special characters aren't converted into �
?
Ultimately, the array needs to be json_encode()
d. I believe these characters are causing problems.
UPDATE 1:
Per the suggestion of @6opko to use utf8_encode, I added this to my code:
array_walk_recursive($iptc, function (&$entry) {
$entry = utf8_encode($entry);
});
This fixed the problem with the copyright symbol. However, in the index ["2#000"][0]
of iptcparse
result, I'm getting \u0000\u0004
. I feel this might have something to do with the IPTC specification that I do not understand yet (and it might be correct, actually). I'm investigating.
UPDATE 2:
Since utf8_encode()
is deprecated, I tried adding this to my script:
ini_set('default_charset', 'UTF-8');
That didn't work. I changed the implementation of the array_walk_recursive
to use mb_convert_encoding($entry, 'UTF-8')
-- and that also didn't work.