-1

I have a log file that can look something like this:

[{"ip":"XXX","prop1":"d","prop2":"xxx","prop3":{"index":0,"type":"xxx"},"id":"xxxxx","reason": "xxx [xxx]"}]

[{"ip":"XXX","prop1":"d","prop2":"xxx","prop3":{"index":0,"type":"xxx"},"id":"xxxxx","reason": "xxx [xxx]"},
 {"ip":"XXX","prop1":"d","prop2":"xxx","prop3":{"index":0,"type":"xxx"},"id":"xxxxx","reason": "xxx [xxx]"}]

As you can see it's an array of JSON objects. I would like to parse this log to be able to convert this back into PHP arrays that I can loop through. I have tried using explode("]", $logContents) but I find that this interferes with the contents of the array (as the contents also have a ']' character inside them). The log file can be massive (20mb+) so I cannot rely on exploding based on more than one character (e.g. explode("}]", $logContents) because then the operation takes too long. I'm sure there must be an easier way to do this!

At the end I'd like to have an array of arrays of the JSON log items. In the given example we'd have an array containing 2 arrays. 1st array would have 1 log item, and the 2nd array would have 2 log items.

Allen S
  • 3,471
  • 4
  • 34
  • 46
  • 1
    Would json_decode help? http://php.net/manual/en/function.json-decode.php – JustBaron Jan 26 '17 at 10:59
  • I think **json_decode()** is what you are looking for. – SajeshBahing Jan 26 '17 at 11:01
  • 1
    Is each item on it's own line? in other words if you would read a file line by line would you be able to explode the containing elements ? – Ken Jan 26 '17 at 11:02
  • Another suggestion look into linux `awk` command to preprocess your log file into a standard json file – Ken Jan 26 '17 at 11:02
  • Possible duplicate of [json\_decode to array](http://stackoverflow.com/questions/5164404/json-decode-to-array) – yivi Jan 26 '17 at 11:03
  • 1
    *"As you can see it's an array of JSON objects"* -- no, it's not. [JSON](https://en.wikipedia.org/wiki/JSON) is a text representation of some data structure. You don't have objects and you don't have an array. You have a file that contain JSONs. Most probably, each line is a separate JSON. Read one line, [`json_decode()`](http://php.net/manual/en/function.json-decode.php) it, inspect the values. Repeat until the end of file. – axiac Jan 26 '17 at 11:05
  • @axiac I understand that the log file contains unparsed text, rather than JSON objects and arrays, I should have been more clear. The file does NOT have JSONs on each separate line. Sometimes they are all on one line. Additionally, if I run json_decode() on the entire string, I cannot get an array of arrays... – Allen S Jan 26 '17 at 11:16
  • @user1775598 no, the log contains JSONs. But there is no such thing as "JSON array" or "JSON object". JSON is text. If it's properly formatted, it can be decoded into arrays or objects. More properly said, the information it contains can be used to create arrays and objects equivalent to those used to create it. It's obvious you cannot pass the entire log file content to `json_encode()`, it contains more than one JSON. – axiac Jan 26 '17 at 11:36
  • Updated the answer with a possible solution for your problem. It should work if the entire content of the log file can be loaded in memory and processed at once. – axiac Jan 26 '17 at 11:56

4 Answers4

0

Try with the following code:

$j_obj1 = '[{"ip":"XXX","prop1":"d","prop2":"xxx","prop3":{"index":0,"type":"xxx"},"id":"xxxxx","reason": "xxx [xxx]"}]';

$j_obj2 = '[{"ip":"XXX","prop1":"d","prop2":"xxx","prop3":{"index":0,"type":"xxx"},"id":"xxxxx","reason": "xxx [xxx]"},
 {"ip":"XXX","prop1":"d","prop2":"xxx","prop3":{"index":0,"type":"xxx"},"id":"xxxxx","reason": "xxx [xxx]"}]';

$j_arr1 = json_decode($j_obj1, true);
$j_arr2 = json_decode($j_obj2, true);

foreach ($j_arr1 as $data) {
    echo $data['ip']; // You can iterate
}

foreach ($j_arr2 as $data) {
    echo $data['prop1']; // You can iterate
}
0

Assuming the file contains on valid JSON per line, a possible fragment of code for your request is this:

foreach (file($logpath) as $line) {
    $entry = json_decode($line, TRUE);
    foreach ($entry as $item) {
        echo('IP: '.$item['ip'].'; prop1: '.$item['prop1']);       // etc
    }
}

If the file is large this workflow doesn't work any more because of memory limitations. You can use fopen()/fgets()/fclose() to read one line at a time and process it:

$fh = fopen($logpath, 'r');
while (! feof($fh)) {
    $line = fgets($fh);
    $entry = json_decode($line, TRUE);
    foreach ($entry as $item) {
        echo('IP: '.$item['ip'].'; prop1: '.$item['prop1']);       // etc
    }
}
fclose($fh);

But if the assumption of one valid JSON per line is not met none of the above code fragments work. In this case you'll have to implement a JSON parser yourself (or find one already implemented) that is able to read from the input string as many data as it needs until it finds a complete JSON string.


Update

You say in a comment that the file does not contain one JSON per line. This renders the code above useless. However, if the file is not large and its entire content can be loaded in memory, there is a hope. You can try to load the content of the file in memory, try to patch it to convert it to a valid JSON then decode it.

If all the JSONs from the file look like the ones you posted in the question (i.e. an array of objects) you can try to identify the sequences of characters ] and [ (or }] and [{) separated only by whitespace characters. This is where a JSON ends (}]) and the next one begins ([{). If you insert commas between each pair of ] and [ and wrap everything in [ and ], the result should be a valid JSON that, when decoded, produce an array. Each element of the array is the array used to generate each JSON from the input file.

Let's try to write the code:

// Get the entire content of the log file in memory in $text
$text  = file_get_contents($logpath);
// Try to patch the content of the file to generate a larger JSON
$fixed = '['.preg_replace('/]\s*\[/', '],[', $text).']';
// Decode the JSON to arrays
$all   = json_decode($fixed, TRUE);

// If $all is not FALSE then we did it!
foreach ($all as $entry) {
    // $entry is one entry from the original log
    // it used to be an array of objects on the source
    // but we decoded the objects to associative arrays
    foreach ($entry as $item) {
        echo('IP: '.$item['ip'].'; prop1: '.$item['prop1']);       // etc
    }
}

The regexp

The regular expression used to identify the boundaries of the original JSONs, split into pieces:

]         # the ']' character, there is nothing special about it
\s        # match a whitespace character (i.e. space, tab, enter)
*         # the previous sub-expression (\s) repeated zero or more times
\[        # match the '[' character; it is a special character in regexps
          # and needs to be escaped here to make it "unspecial".
axiac
  • 68,258
  • 9
  • 99
  • 134
-1

First of all, you have a problem in your json. The last value needs to be enclosed in ". Where you have the "xxx [xxx]}, you should have "xxx [xxx]"} with a closing ".

After that, just pass the string to json_decode() function.

-1
$a = [{"ip":"XXX","prop1":"d","prop2":"xxx","prop3":{"index":0,"type":"xxx"},"id":"xxxxx","reason": "xxx [xxx]"}]

[{"ip":"XXX","prop1":"d","prop2":"xxx","prop3":{"index":0,"type":"xxx"},"id":"xxxxx","reason": "xxx [xxx]"}, {"ip":"XXX","prop1":"d","prop2":"xxx","prop3":{"index":0,"type":"xxx"},"id":"xxxxx","reason": "xxx [xxx]"}]

$j = json_decode($a,true);
array_walk_recursive($j,"decode");