2

I've explored couple of existing JSON query language such JMESPath, JsonPath and JSONiq. Unfortunately, none of them seem to be able to support my use case in a generic way.

Basically, I'm receiving different type of responses from different web services. I need to give the ability to the user to remap the response in a 2 dimensional array in other to leverage our visualization tool. Based on the new format, the user can decide how to display his data between existing widgets. Pretty much like a customisable dashboard entirely managed on the UI.

Anyway my input looks like:

{
  "category_1": [
    {
      "name": "medium",
      "count": 10
    },
    {
      "name": "high",
      "count": 20
    }
  ],
  "category_2": [
    {
      "name": "medium",
      "count": 30
    },
    {
      "name": "high",
      "count": 40
    }
  ]
}

expected output:

[
  {
    "name": "medium",
    "count": 10,
    "category": "1"
  },
  {
    "name": "high",
    "count": 20,
    "category": "1"
  },
  {
    "name": "medium",
    "count": 30,
    "category": "2"
  },
  {
    "name": "high",
    "count": 40,
    "category": "2"
  }
]

The closer I went is with JMESPath but my query isn't dynamic at all. The user needs to be aware of possible category of grouping.

The query looks like: [ category_1[].{name: name, count: count, category: '1'}, category_2[].{name: name, count: count, category: '2'} ] | []

In other words, I need an enough powerful JSON query language to perform this JavaScript code:

const output = flatMap(input, (value, key) => {
  return value.map(x => {
    return { ...x, category: key };
  });
});

Any thoughts?

mathpaquette
  • 406
  • 4
  • 16

5 Answers5

2

This is indeed not currently possible in JMESPath (0.15.x). There are other spec compliant JMESPath packages that (with a bit of extra effort) will do what you require. Using NPM package @metrichor/jmespath (a typescript implementation) you could extend it with the functions you require as follows:


import {
  registerFunction,
  search,
  TYPE_ARRAY,
  TYPE_OBJECT
} from '@metrichor/jmespath';

registerFunction(
  'flatMapValues',
  ([inputObject]) => {
    return Object.entries(inputObject).reduce((flattened, entry) => {
      const [key, value]: [string, any] = entry;

      if (Array.isArray(value)) {
        return [...flattened, ...value.map(v => [key, v])];
      }
      return [...flattened, [key, value]];
    }, [] as any[]);
  },
  [{ types: [TYPE_OBJECT, TYPE_ARRAY] }],
);

With these extended functions a JMESPath expression would now look like this to remap the key into every value:

search("flatMapValues(@)[*].merge([1], {category: [0]})", {
  "category_1": [
    {
      "name": "medium",
      "count": 10
    },
    {
      "name": "high",
      "count": 20
    }
  ],
  "category_2": [
    {
      "name": "medium",
      "count": 30
    },
    {
      "name": "high",
      "count": 40
    }
  ]
});

// OUTPUTS:

[
  {
    category: 'category_1',
    count: 10,
    name: 'medium',
  },
  {
    category: 'category_1',
    count: 20,
    name: 'high',
  },
  {
    category: 'category_2',
    count: 30,
    name: 'medium',
  },
  {
    category: 'category_2',
    count: 40,
    name: 'high',
  },
]

That said you could just register the function you wrote above and use it

Glen van Ginkel
  • 126
  • 1
  • 2
  • Could be a potential solution. I don't like the fact Ill have to maintain additional queries though. – mathpaquette May 21 '20 at 12:05
  • accepted this answer because this one can be managed on the UI. However, if you need a powerful language you should definitely look at JSONiq but requires more deployment effort. – mathpaquette May 26 '20 at 14:32
1

Finally, managed a way with JSONiq using Zorba implementation. Definitively the way to go if you need powerful JSON queries. Apparently this has been integrated in Apache Spark with Rumble

Anyway, here's my solution:

jsoniq version "1.0";

let $categories := 
{
  "category_1": [
    {
      "name": "medium",
      "count": 10
    },
    {
      "name": "high",
      "count": 20
    }
  ],
  "category_2": [
    {
      "name": "medium",
      "count": 30
    },
    {
      "name": "high",
      "count": 40
    }
  ]
}

for $key in keys($categories), $row in flatten($categories.$key)
    return {"count": $row.count, "name": $row.name, "category": $key}

output:

{ "count" : 10, "name" : "medium", "category" : "category_1" }{ "count" : 20, "name" : "high", "category" : "category_1" }{ "count" : 30, "name" : "medium", "category" : "category_2" }{ "count" : 40, "name" : "high", "category" : "category_2" }

You can try Zorba here.

Ghislain Fourny
  • 6,971
  • 1
  • 30
  • 37
mathpaquette
  • 406
  • 4
  • 16
  • 2
    Very glad you managed. I would add that `flatten()` would also recursively flatten nested arrays (arrays in arrays in arrays...) into a flat sequence. The alternate builtin function `members()` or even simply a postfix `[]` that unboxes the array is enough for the purpose, because the data has just a one-dimensional array of objects. (FYI in case you try with the Zorba link, this try page runs on an older version of Zorba, 2.9, in which array unboxing used to be done with () instead of []). I hope this helps! – Ghislain Fourny May 21 '20 at 21:00
0

This is an alternative possibility in JSONiq that does not explicitly list the keys in each row, with the merge constructor {| |}:

jsoniq version "1.0";

let $categories := 
{
  "category_1": [
    {
      "name": "medium",
      "count": 10
    },
    {
      "name": "high",
      "count": 20
    }
  ],
  "category_2": [
    {
      "name": "medium",
      "count": 30
    },
    {
      "name": "high",
      "count": 40
    }
  ]
}
for $key in keys($categories),
    $row in members($categories.$key)
return {|
  $row,
  { "category": $key }
|}

For the sake of completeness, this is the reverse query that would turn the output back into the original input (which uses a group by clause):

jsoniq version "1.0";
let $output :=
(
  { "count" : 10, "name" : "medium", "category" : "category_1" },
  { "count" : 20, "name" : "high", "category" : "category_1" },
  { "count" : 30, "name" : "medium", "category" : "category_2" },
  { "count" : 40, "name" : "high", "category" : "category_2" }
)
return
{|
  for $row in $output
  group by $category := $row.category
  return { $category : [ $row ] }
|}
Ghislain Fourny
  • 6,971
  • 1
  • 30
  • 37
  • could you tell us more why Zorba doesn't have anymore activity on GitHub? Is it still prodable ? What about existing issues? – mathpaquette May 22 '20 at 11:45
  • Just a big code base vs. few resources. Zorba 3.0, however, is stable and does the job -- for at least as long as it continues to work with OS updates. – Ghislain Fourny May 26 '20 at 05:58
  • 1
    Given the evolution towards the JSON-like simplicity of most formats (JSON, Parquet, Avro, etc) and towards big datasets, I chose to invest my time and efforts into the development of Rumble, which is lean (JSON is considerably simpler than XML) and easier to maintain. It supports large datasets as well as small datasets and is not slower than Zorba as it can parallelize on local cores, even without a cluster. It is in a (maturing) beta and actively maintained, and it is free and open source as well. – Ghislain Fourny May 26 '20 at 06:01
0

This is simple with ~Q (disclaimer: I'm the developer).

{
   "results:{}:[]": [{
       "{}:":".",
       "category":"$key"
   }]
}

Output:

{
    "results": [
        {
            "name": "medium",
            "count": 10,
            "category": "category_1"
        },
        {
            "name": "high",
            "count": 20,
            "category": "category_1"
        },
        {
            "name": "medium",
            "count": 30,
            "category": "category_2"
        },
        {
            "name": "high",
            "count": 40,
            "category": "category_2"
        }
    ]
}

Edit: some more info to explain the syntax:

"results:{}:[]"

The :{} part means "iterate over all keys in the object", :[] means "iterate over all array elements".

"{}:":"."

This copies each field in the current object to the output.

"category":"$key"

Add a field called "category", with the current traversed key as value.

If we wanted to get the numbers (i.e. 1,2,... instead of category_1, category_2, etc), we can use substr:

"category": "$key substr(9)"
Sela M.
  • 21
  • 2
-1

You actually don't need any additional libs for that. Here is a small function which does the trick. You only need to split the key.

const transform = (obj) => {
    const ret = [];
    for (let key in obj) {
        const tmp = key.split('_');
        for (let item of obj[key]) {
            ret.push({
                ...item,
                [tmp[0]]: tmp[1],
            });
        }
    }
    return ret;
};

const result = transform(obj);
Andre
  • 458
  • 5
  • 10