0

Using the sample image from https://www.iptc.org/std-dev/photometadata/examples/google-licensable/example-page1.html and the following code:

getimagesize("sample.jpg", $image_info);
if (isset($image_info["APP13"])) {
    $iptc = iptcparse($image_info["APP13"]);
    var_dump($iptc);
}

In the browser, the output shows this: � Copyright 2020 IPTC (Test Images) - www.iptc.org

That first character is supposed to be the copyright symbol. How do ensure that the special characters aren't converted into ?

Ultimately, the array needs to be json_encode()d. I believe these characters are causing problems.

UPDATE 1:

Per the suggestion of @6opko to use utf8_encode, I added this to my code:

array_walk_recursive($iptc, function (&$entry) {
    $entry = utf8_encode($entry);
});

This fixed the problem with the copyright symbol. However, in the index ["2#000"][0] of iptcparse result, I'm getting \u0000\u0004. I feel this might have something to do with the IPTC specification that I do not understand yet (and it might be correct, actually). I'm investigating.

UPDATE 2:

Since utf8_encode() is deprecated, I tried adding this to my script:

ini_set('default_charset', 'UTF-8');

That didn't work. I changed the implementation of the array_walk_recursive to use mb_convert_encoding($entry, 'UTF-8') -- and that also didn't work.

StackOverflowNewbie
  • 39,403
  • 111
  • 277
  • 441
  • Is this on the browser or your command line? – nice_dev Oct 24 '22 at 08:10
  • I see it in my browser. – StackOverflowNewbie Oct 24 '22 at 08:15
  • Can you do utf8_encode()? – 6opko Oct 24 '22 at 08:23
  • Most likely it is a display problem than the content of the metadata inside the image being the culprit. Adding a `` should fix the issue. It is anyways just an `©` thing. – nice_dev Oct 24 '22 at 08:29
  • @nice_dev: I thought so too, so I tested it, but it doesn't work. – KIKO Software Oct 24 '22 at 08:30
  • @KIKOSoftware Strange. Will try it on my machine as well then. – nice_dev Oct 24 '22 at 08:30
  • @nice_dev And the image is not corrupt because it shows correctly [here](https://getpmd.iptc.org/getpmd/html/isearch1/ipmd/?imgurl=https://www.iptc.org/std-dev/photometadata/examples/google-licensable/images/IPTC-GoogleImgSrcPmd_testimg01.jpg). – KIKO Software Oct 24 '22 at 08:32
  • @nice_dev - I've tried calling `header('content-type:text/html;charset=utf-8');` before outputting anything. Problem exist. I don't want to assume that this probablem is exclusively `©` thing. – StackOverflowNewbie Oct 24 '22 at 08:34
  • @6opko - `utf8_encode()` fixed the copyright symbol issue (I had to do an `array_walk_recursive`). It didn't fix everything, though. There are still some weird encoded characters. Will update the original post. – StackOverflowNewbie Oct 24 '22 at 08:42
  • @StackOverflowNewbie Change the default encoding then. By default it is ISO-8859-1. [See this thread](https://stackoverflow.com/questions/9351694/setting-the-php-default-encoding-to-utf-8) . Relying on `utf8_encode` isn't a good idea as mentioned in the warning section in the [doc](https://www.php.net/manual/en/function.utf8-encode.php) – nice_dev Oct 24 '22 at 08:47
  • @KIKOSoftware It was apparently an encoding format issue from PHP's end. – nice_dev Oct 24 '22 at 08:48
  • @nice_dev - I tried setting `ini_set( 'default_charset', 'UTF-8' );` in my code and removed the use of `utf8_encode()`. I'm back to my original problem. How can I get rid of the deprecated function? I tried `mb_convert_encoding`, and that didn't work. – StackOverflowNewbie Oct 24 '22 at 08:57

1 Answers1

1

So, it is an encoding issue since PHP uses ISO-8859-1 format by default. For newer versions, it is already UTF-8. For older ones, see this thread to change the default settings.

utf8_encode() is a viable option but will get deprecated in the newer PHP versions.

So, it is best to use mb_convert_encoding to convert a string from one character encoding to another.

Snippet:

<?php

getimagesize("sample.jpg", $image_info);
if (isset($image_info["APP13"])) {
    $iptc = iptcparse($image_info["APP13"]);
    echo mb_convert_encoding($iptc['2#116'][0], "UTF-8", "ISO-8859-1");
}
nice_dev
  • 17,053
  • 2
  • 21
  • 35