9

I'm trying to process the following with an JSON Input step:

{"address":[
  {"AddressId":"1_1","Street":"A Street"},
  {"AddressId":"1_101","Street":"Another Street"},
  {"AddressId":"1_102","Street":"One more street", "Locality":"Buenos Aires"},
  {"AddressId":"1_102","Locality":"New York"}
]}

However this seems not to be possible:

Json Input.0 - ERROR (version 4.2.1-stable, build 15952 from 2011-10-25 15.27.10 by buildguy) : 
The data structure is not the same inside the resource! 
We found 1 values for json path [$..Locality], which is different that the number retourned for path [$..Street] (3509 values). 
We MUST have the same number of values for all paths.

The step provides Ignore Missing Path flag but it only works if all the rows misses the same path. In that case that step acts as as expected an fills the missing values with null.

This limits the power of this step to read uneven data, which was really one of my priorities.

My step Fields are defined as follows:

JSON Input Fields definition

Am I missing something? Is this the correct behavior?

rsilva4
  • 1,915
  • 1
  • 23
  • 39

2 Answers2

11

What I have done is use JSON Input using $.address[*] to read to a jsonRow field the full map of each element p.e:

{"address":[
    {"AddressId":"1_1","Street":"A Street"},  
    {"AddressId":"1_101","Street":"Another Street"},  
    {"AddressId":"1_102","Street":"One more street", "Locality":"Buenos Aires"},   
    {"AddressId":"1_102","Locality":"New York"} 
]}

This results in 4 jsonRows one for each element, p.e. jsonRow = {"AddressId":"1_101","Street":"Another Street"}. Then using a Javascript step I map my values using this:

var AddressId = getFromMap('AddressId', jsonRow);
var Street = getFromMap('Street', jsonRow);
var Locality = getFromMap('Locality', jsonRow);

In a second script tab I inserted minified JSON parse code from https://github.com/douglascrockford/JSON-js and the getFromMap function:

function getFromMap(key,jsonRow){
  try{
   var map = JSON.parse(jsonRow);
  }
  catch(e){
   var message = "Unparsable JSON: "+jsonRow+" Desc: "+e.message;
   var nr_errors = 1;
   var field = "jsonRow";
   var errcode = "JSON_PARSE";
   _step_.putError(getInputRowMeta(), row, nr_errors, message, field, errcode);
   trans_Status = SKIP_TRANSFORMATION;
   return null;
  }

  if(map[key] == undefined){
   return null;
  }
  trans_Status = CONTINUE_TRANSFORMATION;
  return map[key]
}
rsilva4
  • 1,915
  • 1
  • 23
  • 39
  • it would be nice if you could copy answer elements here from your link :) (and accept your own answer btw) – redben Nov 05 '12 at 17:28
  • 1
    Done! Was the answer useful to you? – rsilva4 Nov 05 '12 at 21:37
  • 1
    Four years late, but still - very nice idea. I'm still a bit disappointed that the JSON input step doesn't support this by design, because it's kind of expensive to first break down the object into elements and then compile a JS script. +1 for the resourcefulness. – Yuval Herziger Jun 15 '16 at 14:25
  • 10 years later. Pentaho still hasn't fixed this issue. – Nikhil Feb 07 '23 at 14:52
3

You can solve this by changing the JSONPath and splitting up the steps in two JSON input steps. The following website explains a lot about JSONPath: http://goessner.net/articles/JsonPath/

$..AddressId

Does in fact return all the AddressId's in the address array, BUT since Pentaho is using grid rows for input and output [4 rows x 3 columns], it can't handle a missing value aka null value when you want as results return all the Streets (3 rows) and return all the Locality (2 rows), simply because there are no null values in the array itself as in you can't drive out of your garage with 3 wheels on your car instead of the usual 4.

I guess your script returns null (where X is zero) values like:

A S X
A S X
A S L
A X L

The scripting step can be avoided same by changing the Fields path of the first JSONinput step into:

$.address[*]

This is to retrieve all the 4 address lines. Create a next JSONinput step based on the new source field which contains the address line(s) to retrieve the address details per line:

$.AddressId
$.Street
$.Locality

This yields the null values on the four address lines when a address details is not available in an address line.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
bsecker
  • 86
  • 2
  • I tested your suggestion. No luck, still failing. Check it out here http://pastebin.com/8Ez68rpV – rsilva4 Jul 15 '13 at 18:13
  • Hi rsilva, you were right. I just made this work, by using a first json input step to get all the address lines and another one for the obtaining the Street / AddressId and Locality if available: http://pastebin.com/iRqNJMPF – bsecker Jul 29 '13 at 11:32
  • 1
    This worked well for me and seems more elegant than the accepted answer. Accepted one should work too, but it's nicer to avoid Javascripting your way to the solution when possible. – codemonkey Oct 20 '17 at 01:55