5

If I want to compare a range of API responses for the complexity of the response (as a proxy for how much effort it likely takes to parse and validate the response) is there any existing tool or library that can do that pretty effectively? or a simple bit of code?

Ideally something that prints out a quick report showing how deep and wide the whole structure is, along with any other metrics that might be useful.

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • Maximum depth of nesting? – Botje Aug 06 '20 at 13:03
  • 1
    Regardless of my answer, I am skeptical that knowing the "complexity" of a json object can be of much use. Perhaps knowing the size of a json object can be useful, but the "complexity" almost definitely shouldn't matter... – Gershom Maes Aug 06 '20 at 22:08
  • In my particular context, I'm trying to come up with a rough sense of how expensive it is to parse, process, validate, transform, etc., and I've realised that size is not a good enough proxy by itself (e.g., a JSON object with a single key and a 1 MB string value would be pretty simple to validate). – Jun-Dai Bates-Kobashigawa Aug 06 '20 at 22:10
  • I had thought of a couple of ways to implement it by hand, but was curious if there were any standard metrics to describe a JSON structure (and libraries to calculate them). There are similar things for code complexity, and things like "wc" for text documents, measures of org complexity in company org charts (depth and SPOC). – Jun-Dai Bates-Kobashigawa Aug 06 '20 at 22:15
  • @Jun-DaiBates-Kobashigawa The simplest (and probably not even bad) measure would be `jsonText.length`. Very efficient, and pretty meaningful - most json parsers are limited by input size and (read) speed, not depth of nesting or something. – Bergi Aug 06 '20 at 23:05
  • @Jun-DaiBates-Kobashigawa `how expensive it is to parse, process, validate, transform` i'd say ... perform the necessary actions and measure how long it takes? – Marco Sep 15 '22 at 16:13

2 Answers2

3

A heuristic is to simply count the number of {, }, [, and ] characters. Of course this is only a heuristic; under this method a json object like { value: "{[}{}][{{}{}]{}{}{}[}}{}{" } would be considered overly complex even though its structure is very straightforward.

let guessJsonComplexity = (json, chars='{}[]')) => {
  let count = 0;
  for (let char in json) if (chars.includes(char)) count++;
  return count / (json.length || 1);
};

You would go with this answer if speed is very important.

You'll almost certainly need to parse the json if you want a more concise answer!

We could also consider another approach. Consider assigning a "complexity score" for every possible phenomenon that can happen in json. For example:

  • A string s is included; complexity score: Math.log(s.length)
  • A number n is included; complexity score: Math.log(n)
  • A boolean is included; complexity score: 1
  • An array is included; complexity score: average complexity of elements + 1
  • An object is included; complexity score: average complexity of values plus average complexity of keys + 1

We could even pick out distinct relationships, like "an object is included in an array", or "an array is included in an array", etc, if we want to consider some of these as being more "complicated" than others. We could say, for example, that negative numbers are twice as "complicated" as positive numbers, if that's how we feel.

We can also consider a "depth factor", which makes elements count for more the deeper they go.

If we define how to score all these phenomena, we can write a function that processes json and applies such a score:

let isType = (val, Cls) => val != null && val.constructor === Cls;
let getComplexity = (json, d=1.05) => {
  
  // Here `d` is our "depth factor"
  
  return d * (() => {

    // Take the log of the length of a String
    if (isType(json, String)) return Math.log(json.length);

    // Take the log of (the absolute value of) any Number
    if (isType(json, Number)) return Math.log(Math.abs(json));

    // Booleans always have a complexity of 1
    if (isType(json, Boolean)) return 1;

    // Arrays are 1 + (average complexity of their child elements)
    if (isType(json, Array)) {
      let avg = json.reduce((o, v) => o + getComplexity(v, d), 0) / (json.length || 1);
      return avg + 1;
    }

    // Objects are 1 + (average complexity of their keys) + (average complexity of their values)
    if (isType(json, Object)) {
      // `getComplexity` for Arrays will add 1 twice, so subtract 1 to compensate
      return getComplexity(Object.keys(json), d) + getComplexity(Object.values(json), d) - 1;
    }

    throw new Error(`Couldn't get complexity for ${json.constructor.name}`);
    
  })();
  
};

console.log('Simple:', getComplexity([ 'very', 'simple' ]));
console.log('Object:', getComplexity({
  i: 'am',
  some: 'json',
  data: 'for',
  testing: 'purposes'
}));
console.log('Complex:', getComplexity([
  [ 111, 222, 333, 444 ],
  [ 'abc', 'def', 'ghi', 'jkl' ],
  [ [], [], {}, {}, 'abc', true, false ]
]));
console.log('Deep:', getComplexity([[[[[[ 'hi' ]]]]]]));

If you want to know more detailed information on children of a large json object, you can simply call getComplexity on those children as well.

Gershom Maes
  • 7,358
  • 2
  • 35
  • 55
  • I would also count `,`, `:` or `"` (or even the sequences `":`) to identify the number of properties and elements inside these structures. – Bergi Aug 06 '20 at 23:07
-1

Im using arbitrary values, but this is just to give you a start point.

var data1 = { "a": { "b": 2 }, "c": [{}, {}, { "d": [1, 2, 3] }] }
var data2 = { "a": { "b": 2 }, "c": [{"x":"y","z":[0,1,2,3,4,5,6,7,8,9]}, {}, { "d": [1, 2, 3] }] }

function chkComplexity(obj) {
  let complexity = 0;
  let depth = 1;
  (function calc(obj) {
    for (const key of Object.keys(obj)) {
      if (typeof obj[key] !== "object") complexity += depth
      if (Array.isArray(obj)) {
        depth++
        complexity += depth * 2
        for (const item of obj) {
          calc(item)
        }
      }
      if (typeof obj[key] === "object") {
        depth++
        complexity += depth * 3
        calc(obj[key])
      }
    }
  })(obj);
  return complexity;
}
console.log(chkComplexity(data1));
console.log(chkComplexity(data2));
gillall
  • 111
  • 1
  • 7
  • 1
    Because the function uses global variables it will return different values if you call it twice, even if you provide the same data each time – Gershom Maes Aug 06 '20 at 22:01
  • suggestion: have `depth` be a parameter and `complexity` be the return value. This will give you proper isolation for these shifting values. as it is now, you increment depth when descending but have neglected to decrement it after returning back from that descent so you depth values are going to balloon upwards wrongly. Consider: `[[[1]],2,3]` and what depth values should `2` and `3` have? – Segfault Aug 07 '20 at 20:27