1

First of all, I appreciate there are lots of answers regarding dealing with large JSON files. However, I have yet to find one that encounters my scenario.

The problem I face is that I have large JSON files (12mb) that look like this:

{
  "range": "Sheet1!A1:P40571", 
  "majorDimension": "ROWS",
  "values": [
    [
      "new_id",
      "qty",
      "total_job_cost",
      "total_job_revenue",
      "total_job_profit",
      "total_job_margin"
    ],
    [
      "34244",
      "5",
      "211.25",
      "297.00",
      "85.75",
      "28.87%"
    ],
    [
      "34244",
      "10",
      "211.25",
      "297.00",
      "85.75",
      "28.87%"
    ],
    ...
  ]
}

And I wish to extract out the values array, and convert it into a csv that would like this:

new_id,total_job_cost,total_job_revenue,total_job_profit,total_job_margin
34244,211.25,297.00,85.75,28.87%
34245,211.25,297.00,85.75,28.87%
...

However, since the values array is so large, when I try to extract it using a PHP library for JSON parsing, my server crashes when it tries to read it.

Any suggestions or tips appreciated. Thanks.

Jack Robson
  • 2,184
  • 4
  • 27
  • 50
  • without typing out the code, the key items you want is `json_decode` and `implode` and `foreach` – Forbs Nov 20 '17 at 17:55
  • 1
    Can you increase your memory limit? The thing about JSON, unless I'm mistaken, is that you have to parse it all. You can't read it line by line like some other formats. – Don't Panic Nov 20 '17 at 17:56
  • Also, post the code that you have right now. A 12MB JSON file shouldn't usually cause too much trouble unless you've got other problems in your code. – Sammitch Nov 20 '17 at 18:31

2 Answers2

0

You can't read json line by line,but not with any built in libraries. I wrote a simple Json parser for another answer here

Convert structure to PHP array

I had to make a slight modification to handle "real" json" In the switch change this token

 case 'T_ENCAP_STRING':
      if( $mode == 'key'){
           $key .= trim($content,'"');
      }else{
           value .= unicode_decode($content); //encapsulated strings are always content
      }
      next($lexer_stream);//consume a token
 break;

You can test it here

http://sandbox.onlinephpfunctions.com/code/b2917e4bb8ef847df97edbf0bb8f415a10d13c9f

and find the full (updated) code here

https://github.com/ArtisticPhoenix/MISC/blob/master/JasonDecoder.php

Can't guarantee it will work but it's worth a shot. It should be fairly easy to modify it to read your file.

ArtisticPhoenix
  • 21,464
  • 2
  • 24
  • 38
0

If the problem is simply to convert the large JSON file to a CSV file, then perhaps a jq solution is admissible. Depending on the computing environment, jq can generally handle large files (GB) breezily, and with a little more effort, it can usually handle even larger files as it has a "streaming parser".

In any case, here is a jq solution to the problem as stated:

jq -r '(.values[] | [.[0,2,3,4,5]]) | @csv' data.json > extract.csv

For the sample input, this produces:

"new_id","total_job_cost","total_job_revenue","total_job_profit","total_job_margin"
"34244","211.25","297.00","85.75","28.87%"
"34244","211.25","297.00","85.75","28.87%"

This is valid CSV, and the use of @csv guarantees the result, if any, will be valid CSV, but if you want the quotation marks removed, there are several options, though whether they are "safe" or not will depend on the data. Here is an alternative jq solution that produces comma-separated values. It uses join(",") instead of @csv:

 (.values[] | [.[0,2,3,4,5]]) | join(",")
peak
  • 105,803
  • 17
  • 152
  • 177