3

I am facing an issue where I have a JSON array of objects in a .json file. I can get the content of the file using file_get_contents $str = file_get_contents($jsonFile); However when I perform json_decode on the content I just get null as result. Below is some of the content from the .json file

[
{
    "accreditation": false,
    "category.en": "Administration and Management",
    "category.fr": "Administration et gestion",
    "clientele.en": null,
    "clientele.fr": null,
    "courseid": 11749,
    "duree": "",
    "dureeminutes": 0,
    "establishmentaltname": "06-ciusss-cusm",
    "establishmentfullname": "Centre universitaire de santé McGill",
    "fcpresponsable": "",
    "idnumber": "",
    "idnumberalt": "",
    "imgurl": null,
    "ispartageable": false,
    "keywords": null,
    "lastupdate": 0,
    "modalite.en": "In Person",
    "modalite.fr": "En présentiel",
    "nombreinscriptions": 1,
    "parentestablishmentfullname": "Territoire CUSM",
    "parentestablishmentshortname": "CUSM-FCP",
    "partageable": "Locale",
    "shortname.en": "Formation Context 04072022 12h41",
    "shortname.fr": "Formation Context 04072022 12h41",
    "summary.en": "",
    "summary.fr": "",
    "title.en": "Formation Context 04072022 12h41",
    "title.fr": "Formation Context 04072022 12h41",
    "visible": false
},
{
    "accreditation": false,
    "category.en": "Administration and Management",
    "category.fr": "Administration et gestion",
    "clientele.en": null,
    "clientele.fr": null,
    "courseid": 11748,
    "duree": "",
    "dureeminutes": 0,
    "establishmentaltname": "06-ciusss-cusm",
    "establishmentfullname": "Centre universitaire de santé McGill",
    "fcpresponsable": "",
    "idnumber": "",
    "idnumberalt": "",
    "imgurl": null,
    "ispartageable": false,
    "keywords": null,
    "lastupdate": 0,
    "modalite.en": "In Person",
    "modalite.fr": "En présentiel",
    "nombreinscriptions": 1,
    "parentestablishmentfullname": "Territoire CUSM",
    "parentestablishmentshortname": "CUSM-FCP",
    "partageable": "Locale",
    "shortname.en": "Formation Contexte 040722 08h51m",
    "shortname.fr": "Formation Contexte 040722 08h51m",
    "summary.en": "",
    "summary.fr": "",
    "title.en": "Formation Contexte 040722 08h51m",
    "title.fr": "Formation Contexte 040722 08h51m",
    "visible": true
},
{
    "accreditation": false,
    "category.en": "Administration and Management",
    "category.fr": "Administration et gestion",
    "clientele.en": null,
    "clientele.fr": null,
    "courseid": 11747,
    "duree": "",
    "dureeminutes": 0,
    "establishmentaltname": "06-ciusss-cusm",
    "establishmentfullname": "Centre universitaire de santé McGill",
    "fcpresponsable": "",
    "idnumber": "",
    "idnumberalt": "",
    "imgurl": null,
    "ispartageable": false,
    "keywords": null,
    "lastupdate": 0,
    "modalite.en": "In Person",
    "modalite.fr": "En présentiel",
    "nombreinscriptions": 1,
    "parentestablishmentfullname": "Territoire CUSM",
    "parentestablishmentshortname": "CUSM-FCP",
    "partageable": "Locale",
    "shortname.en": "Formation Contexte 04072022",
    "shortname.fr": "Formation Contexte 04072022",
    "summary.en": "",
    "summary.fr": "",
    "title.en": "Formation Contexte 04072022",
    "title.fr": "Formation Contexte 04072022",
    "visible": false
}]

How can I convert it into valid json string for php or an array. The JSON is a valid JSON but after I use file_get_contents it inserts line breaks and \n like here: https://3v4l.org/5Zd7O Below is a snippet of my code:

$str = file_get_contents('jsondump.json');
var_dump(gettype($str));
var_dump($str);

$jsonArr = json_decode($str,1); // decode the JSON into an associative array
var_dump($jsonArr) ;
echo json_last_error_msg();

I tried checking the encoding using mb_convert_encoding() however the result is still the same, I did:

$str = file_get_contents($jsonFile); 

$encoding = mb_detect_encoding($str, 'UTF-8, ISO-8859-1', true);
$str2 =  mb_convert_encoding($str, 'UTF-8', $encoding);
var_dump(gettype($str));
var_dump($str);
var_dump($str2);
var_dump($encoding);

When I display the var_dump results I get $encoding value as "\nstring(5) "UTF-8" The first $str is like below: [{\n "accreditation": false,\n "category.en": "Template",\n "category.fr": "Gabarit",\n "clientele.en": n ull,\n "clientele.fr": null,\n "courseid": 816,\n "duree": "1h00m",\n "dureeminutes": 60,\n "establishmentaltname": "06-ciusss-cusm",\n "establishmentfullname": "Centre universitaire de sant \xc3\xa9 McGill",\n "fcpresponsable": "",\n "idnumber": "",\n "idnumberalt": "",\n "imgurl": null,\n "ispartageable": true,\n "keywords": null,\n "lastupdate": 1483246800,\n "m odalite.en": "Online",\n "modalite.fr": "En ligne",\n "nombreinscriptions": 6,\n "parentestablishmentfullname": "Territoire CUSM",\n "parentestablishmentshortname": "CUSM-FCP",\n "partageable": "Pa rtageable",\n "shortname.en": "E-learning Course Template",\n "shortname.fr": "gabarit d\'une formation en ligne",\n "summary.en": "This template is to be used when creating an e-learning course as part of the F CP program. It is important that we standardize the training structure to allow users a more user friendly experience. ",\n "summary.fr": "Ce gabarit devra \xc3\xaatre utilis\xc3\xa9 lors de la cr\xc3\xa9ation d\'un cours FCP en ligne. Il est important d\'uniformiser la structure de formation afin de permettre une exp\xc3\xa9rience plus conviviale aux apprenants.",\n "title.en": "FCP E-learning Course Template",\n "title.fr": "FCP Gabarit de formation en ligne",\n "visible": false\n }] and $str2 is the same like this [{\n "accreditation": false,\n "category.en": "Template",\n "category.fr": "Gabarit",\n "clientele.en": n ull,\n "clientele.fr": null,\n "courseid": 816,\n "duree": "1h00m",\n "dureeminutes": 60,\n "establishmentaltname": "06-ciusss-cusm",\n "establishmentfullname": "Centre universitaire de sant \xc3\xa9 McGill",\n "fcpresponsable": "",\n "idnumber": "",\n "idnumberalt": "",\n "imgurl": null,\n "ispartageable": true,\n "keywords": null,\n "lastupdate": 1483246800,\n "m odalite.en": "Online",\n "modalite.fr": "En ligne",\n "nombreinscriptions": 6,\n "parentestablishmentfullname": "Territoire CUSM",\n "parentestablishmentshortname": "CUSM-FCP",\n "partageable": "Pa rtageable",\n "shortname.en": "E-learning Course Template",\n "shortname.fr": "gabarit d\'une formation en ligne",\n "summary.en": "This template is to be used when creating an e-learning course as part of the F CP program. It is important that we standardize the training structure to allow users a more user friendly experience. ",\n "summary.fr": "Ce gabarit devra \xc3\xaatre utilis\xc3\xa9 lors de la cr\xc3\xa9ation d\'un cours FCP en ligne. Il est important d\'uniformiser la structure de formation afin de permettre une exp\xc3\xa9rience plus conviviale aux apprenants.",\n "title.en": "FCP E-learning Course Template",\n "title.fr": "FCP Gabarit de formation en ligne",\n "visible": false\n }]

Karan
  • 258
  • 1
  • 3
  • 12
  • 1
    Use `echo json_last_error_msg()` to see the reason. – Barmar Sep 08 '22 at 16:49
  • For us to help you, you should provide the code that is actually erroring, you have provided the JSON content and not the full thing just a snippet so that's no use also we can't see how your supposedly parsing it in PHP, we can only assume and this is of no benefit to anyone. – Barkermn01 Sep 08 '22 at 16:51
  • 1
    The text you posted is valid JSON: https://jsonlint.com/. SUGGESTIONS: 1) show us your json_decode(). 2) Add json_last_error_msg() to determine the exact error, 3) Look at these examples: https://www.php.net/manual/en/function.json-decode.php – paulsm4 Sep 08 '22 at 16:52
  • 1
    In addition, did you try to only use the json snippet your shared? Did it work? If yes, there could be some error on another part of the full JSON file. – Uwe Sep 08 '22 at 16:56
  • @Barmar I tried the echo json_last_error_msg() I just get a syntax error – Karan Sep 08 '22 at 17:05
  • The file must have something else beside what you posted. Maybe some invisible control characters. Check it with a hex editor. – Barmar Sep 08 '22 at 17:08
  • @paulsm4 yes it is a valid JSON, I verified this before – Karan Sep 08 '22 at 17:24
  • @Barkermn01 I have added my code snippet – Karan Sep 08 '22 at 17:25
  • 1
    Please do the following: 1) Add these three lines: `$text1 = file_get_contents()`, `$encoding = mb_detect_encoding($text1, 'UTF-8, ISO-8859-1', true);` and `$text2 = mb_convert_encoding($text1, 'UTF-8', $encoding);` 2) [Edit] your post. Show us the code and the results. 3) Please confirm the original file is OK. 4) Please tell us the encoding of the original file (e.g. French/ISO 8859-1). – paulsm4 Sep 08 '22 at 20:22
  • The json you posted works fine. Maybe post the exact output of file_get_contents from the file. var_dump the contents directly – Yeak Sep 08 '22 at 21:07
  • @Yeak- please see the thread below: https://stackoverflow.com/a/73653095/421195 – paulsm4 Sep 08 '22 at 21:28

2 Answers2

2
  1. The JSON text you posted is OK. Unfortunately, that's NOT the text you're passing to json_decode(). Hence the error.

  2. Assuming your original .json file is OK, it appears that file_get_contents() is corrupting the JSON text.

  3. SUGGESTION:

http://truelogic.org/wordpress/2018/08/19/php-file_get_contents-for-utf-encoded-content/

One of the problems of file_get_contents() is that it messes up the data if the file contains special characters outside the standard ASCII character set.

The solution is to convert the encoding of the contents to UTF-8, but only after it has detected the desired encoding. So for instance if we know the file contains European languages like Spanish or French then we specify the detection for ISO-8859-1. For Arabic it would be ISO-8859-6 and so on.

function file_get_contents_utf8($fn) {
     $content = file_get_contents($fn);
      return mb_convert_encoding($content, 'UTF-8',
          mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true));
}

It sounds like your file is French/ISO-8859-1, and it sounds like all you have to do is use mb_convert_encoding() to convert it to UTF-8 before attempting json_decode().

See also mb_detect_encoding for more details.


Per the OP, he's reading a perfectly legal JSON file like this:

[
{
    "accreditation": false,
    "category.en": "Administration and Management",
    "category.fr": "Administration et gestion",
    "clientele.en": null,
    "clientele.fr": null,
    "courseid": 11749,
    ...
    "lastupdate": 0,
    "modalite.en": "In Person",
    "modalite.fr": "En présentiel",
    "nombreinscriptions": 1,
    ...
    "partageable": "Locale",

... but file_get_contents() is corrupting the text, like this:

[{
        "accreditation": false,
        "category.en": "Template",
        "category.fr": "Gabarit",
        "clientele.en": n ull,
        ...
        "m odalite.en": "Online",
        "modalite.fr": "En ligne",
        "nombreinscriptions": 6,
        ...
        "partageable": "Pa rtageable",

file_get_contents() doesn't always "play nice" with non-ASCII, multi-byte text, per the link I cited above. A common solution is to call mb_convert_encoding() to convert the string to UTF-8. I gave an example above.

It appears, however, that the OP's input text is corrupted badly enough that mb_convert_encoding() doesn't work. I can't explain this.

SUGGESTED ALTERNATIVE: read the bytes directly (instead of using file_get_contents()). Then call mb_convert_encoding(), to ensure json_decode() gets UTF-8 text:

Is there an alternative to file_get_contents?

fwrite() and UTF8

https://stackoverflow.com/a/31214886/421195

@Karan -

Q: Are you SURE the input file is 100% OK? There seem to be a few minor discrepancies between the examples.

Q: Have you looked at one of the "bad" files in a hex editor? Perhaps the "mysterious spaces" might be due to "hidden characters" that would only show up if you viewed the file in hex?

Q: What's your PHP version? Perhaps upgrading might resolve the problem?

paulsm4
  • 114,292
  • 17
  • 138
  • 190
  • I tried this but still the UTF-8 characters are mal coded, It didn't fix the issue – Karan Sep 08 '22 at 18:17
  • Please: 1) Modify your code. Make separate statements for `$text1 = file_get_contents()`, `$encoding = mb_detect_encoding($text1, 'UTF-8, ISO-8859-1', true); ` and `$text2 = mb_convert_encoding($text1, 'UTF-8', $encoding);` 2) Get the values for $text1, $encoding and $text2 (e.g. "echo", or whatever's convenient), 3) [Edit] your post. Show us exactly what you tried; copy/paste the results. Also confirm 1) the original file is OK, and 2) the original file is encoded as 8859-1 text. – paulsm4 Sep 08 '22 at 18:23
  • @Karan: Q: Any updates? The problem seems to be file_get_contents() is "corrupting" the text in your JSON file. I believe the solution is probably mb_convert_encoding(). But we need to be methodical: take a small step at a time. We absolutely shouldn't "assume" at any step along the way. Please update your post with my suggestions above. – paulsm4 Sep 08 '22 at 21:26
  • You may wish to note this requires the `mbstring` extensions to be installed and activated to work this does not come through with standard `install php` on most nix based OSes, but also it's now bad practise and by default blocked to allow fread from remote sources should use cURL and since your expecting the mbstring extension it not to much to expect the cURL extension. – Barkermn01 Sep 09 '22 at 09:00
  • @paulsm4 Sorry for being late, but I have updated the question please check – Karan Sep 09 '22 at 13:02
  • Thank you. I'm concerned about this: `"clientele.en": n ull,` (SPACE in "null" JS keyword), this: `"m odalite.en": "Online",` (SPACE in JS name) and these: `"gabarit d\'une formation en ligne"`, `"Ce gabarit devra \xc3\xaatre utilis\xc3\xa9 lors de la cr\xc3\xa9ation d\'un cours FCP en ligne.` (ESCAPE SEQUENCES in text). Q: Are these in the original JSON file as well? – paulsm4 Sep 09 '22 at 15:27
  • @paulsm4 No the orignal json file has no spacing like that, its only after I run file_get_contents, is that I get this, I have provided a sample of the original file in the code snippet. Just fyi I am running this a PHP script – Karan Sep 09 '22 at 16:24
  • 1
    @paulsm4, I am accepting your answer, The issue was with `file_get_contents`. I used just the `file` function which converted the file contents to an array and I later used implode to reconvert to string like: `$str= file($jsonFile); $str = implode("", $str);` – Karan Sep 09 '22 at 21:09
-2

I did try with the JSON data you provided and it's working fine. You can check if you are using correct path where your JSON file is stored in file_get_contents().

Below is the example: https://3v4l.org/bFS59

  • This is really a comment, not an answer. With a bit more rep, [you will be able to post comments](//stackoverflow.com/privileges/comment). – Barmar Sep 08 '22 at 17:08
  • @MohammedJhosawa The problem seems to be after I use file_get_contents($jsonFile). which inserts line breaks and empty spaces. I have updated my question – Karan Sep 08 '22 at 17:13