This was an interesting problem. Here's what I came up with:
// Utility functions
const isInt = Number.isInteger
const path = (ps = [], obj = {}) =>
ps .reduce ((o, p) => (o || {}) [p], obj)
const assoc = (prop, val, obj) =>
isInt (prop) && Array .isArray (obj)
? [... obj .slice (0, prop), val, ...obj .slice (prop + 1)]
: {...obj, [prop]: val}
const assocPath = ([p = undefined, ...ps], val, obj) =>
p == undefined
? obj
: ps.length == 0
? assoc(p, val, obj)
: assoc(p, assocPath(ps, val, obj[p] || (obj[p] = isInt(ps[0]) ? [] : {})), obj)
// Helper functions
function * getPaths(o, p = []) {
if (Object(o) !== o || Object .keys (o) .length == 0) yield p
if (Object(o) === o)
for (let k of Object .keys (o))
yield * getPaths (o[k], [...p, isInt (Number (k)) ? Number (k) : k])
}
const canonicalPath = (path) =>
path.map (n => isInt (Number (n)) ? 0 : n)
const splitPaths = (xs) =>
Object .values ( xs.reduce (
(a, p, _, __, cp = canonicalPath (p), key = cp .join ('\u0000')) =>
({...a, [key]: a [key] || {canonical: cp, path: p} })
, {}
))
// Main function
const canonicalRep = (data) => splitPaths ([...getPaths (data)])
.reduce (
(a, {path:p, canonical}) => assocPath(canonical, path(p, data), a),
Array.isArray(data) ? [] : {}
)
// Test
const data = [{"dog": "lmn", "tiger": [{"bengoltiger": {"height": {"x": 4}}, "indiantiger": {"foor": "b", "paw": "a"}}, {"bengoltiger": {"width": {"a": 8}}, "indiantiger": {"b": 3}}]}, {"dog": "pqr", "lion": 90, "tiger": [{"bengoltiger": {"width": {"m": 3}}, "indiantiger": {"foor": "b", "paw": "a"}}, {"bengoltiger": {"height": {"n": 8}}, "indiantiger": {"b": 3}}]}]
console .log (
canonicalRep (data)
)
The first few functions are plain utility functions that I would keep in a system library. They have plenty of uses outside this code:
isInt
is simply a first-class function alias to Number.isInteger
path
finds the nested property of an object along a given pathway
path(['b', 1, 'c'], {a: 10, b: [{c: 20, d: 30}, {c: 40}], e: 50}) //=> 40
assoc
returns a new object cloning your original, but with the value of a certain property set to or replaced with the supplied one.
assoc('c', 42, {a: 1, b: 2, c: 3, d: 4}) //=> {a: 1, b: 2, c: 42, d: 4}
Note that internal objects are shared by reference where possible.
assocPath
does this same thing, but with a deeper path, building nodes as needed.
assocPath(['a', 'b', 1, 'c', 'd'], 42, {a: {b: [{x: 1}, {x: 2}], e: 3})
//=> {a: {b: [{x: 1}, {c: {d: 42}, x: 2}], e: 3}}
Except for isInt
, these borrow their APIs from Ramda. (Disclaimer: I'm a Ramda author.) But these are unique implementations.
The next function, getPaths
, is an adaptation of one from another SO answer. It lists all the paths in your object in the format used by path
and assocPath
, returning an array of values which are integers if the relevant nested object is an array and strings otherwise. Unlike the function from which is was borrowed, it only returns paths to leaf values.
For your original object, it returns an iterator for this data:
[
[0, "dog"],
[0, "tiger", 0, "bengoltiger", "height", "x"],
[0, "tiger", 0, "indiantiger", "foor"],
[0, "tiger", 0, "indiantiger", "paw"],
[0, "tiger", 1, "bengoltiger", "width", "a"],
[0, "tiger", 1, "indiantiger", "b"],
[1, "dog"],
[1, "lion"],
[1, "tiger", 0, "bengoltiger", "width", "m"],
[1, "tiger", 0, "indiantiger", "foor"],
[1, "tiger", 0, "indiantiger", "paw"],
[1, "tiger", 1, "bengoltiger", "height", "n"],
[1, "tiger", 1, "indiantiger", "b"]
]
If I wanted to spend more time on this, I would replace that version of getPaths
with a non-generator version, just to keep this code consistent. It shouldn't be hard, but I'm not interested in spending more time on it.
We can't use those results directly to build your output, since they refer to array elements beyond the first one. That's where splitPaths
and its helper canonicalPath
come in. We create the canonical paths by replacing all integers with 0, giving us a data structure like this:
[{
canonical: [0, "dog"],
path: [0, "dog"]
}, {
canonical: [0, "tiger", 0, "bengoltiger", "height", "x"],
path: [0, "tiger", 0, "bengoltiger", "height", "x"]
}, {
canonical: [0, "tiger", 0, "indiantiger", "foor"],
path: [0, "tiger", 0, "indiantiger", "foor"]
}, {
canonical: [0, "tiger", 0, "indiantiger", "paw"],
path: [0, "tiger", 0, "indiantiger", "paw"]
}, {
canonical: [0, "tiger", 0, "bengoltiger", "width", "a"],
path: [0, "tiger", 1, "bengoltiger", "width", "a"]
}, {
canonical: [0, "tiger", 0, "indiantiger", "b"],
path: [0, "tiger", 1, "indiantiger", "b"]
}, {
canonical: [0, "lion"],
path: [1, "lion"]
}, {
canonical: [0, "tiger", 0, "bengoltiger", "width", "m"],
path: [1, "tiger", 0, "bengoltiger", "width", "m"]
}, {
canonical: [0, "tiger", 0, "bengoltiger", "height", "n"],
path: [1, "tiger", 1, "bengoltiger", "height", "n"]
}]
Note that this function also removes duplicate canonical paths. We originally had both [0, "tiger", 0, "indiantiger", "foor"]
and [1, "tiger", 0, "indiantiger", "foor"]
, but the output only contains the first one.
It does this by storing them in an object under a key created by joining the path together with the non-printable character \u0000
. This was the easiest way to accomplish this task, but there is an extremely unlikely failure mode possible 1 so if we really wanted we could do a more sophisticated duplicate checking. I wouldn't bother.
Finally, the main function, canonicalRep
builds a representation out of your object by calling splitPaths
and folding over the result, using canonical
to say where to put the new data, and applying the path
function to your path
property and the original object.
Our final output, as requested, looks like this:
[
{
dog: "lmn",
lion: 90,
tiger: [
{
bengoltiger: {
height: {
n: 8,
x: 4
},
width: {
a: 8,
m: 3
}
},
indiantiger: {
b: 3,
foor: "b",
paw: "a"
}
}
]
}
]
What's fascinating for me is that I saw this as an interesting programming challenge, although I couldn't really imagine any practical uses for it. But now that I've coded it, I realize it will solve a problem in my current project that I'd put aside a few weeks ago. I will probably implement this on Monday!
Update
Some comments discuss a problem with a subsequent empty value tries to override a prior filled value, causing a loss in data.
This version attempts to alleviate this with the following main function:
const canonicalRep = (data) => splitPaths ([...getPaths (data)])
.reduce (
(a, {path: p, canonical}, _, __, val = path(p, data)) =>
isEmpty(val) && !isEmpty(path(canonical, a))
? a
: assocPath(canonical, val, a),
Array.isArray(data) ? [] : {}
)
using a simple isEmpty
helper function:
const isEmpty = (x) =>
x == null || (typeof x == 'object' && Object.keys(x).length == 0)
You might want to update or expand this helper in various ways.
My first pass worked fine with the alternate data supplied, but not when I switched the two entries in the outer array. I fixed that, and also made sure that an empty value is kept if it's not overridden with actual data (that's the z
property in my test object.)
I believe this snippet solves the original problem and the new one:
// Utility functions
const isInt = Number.isInteger
const path = (ps = [], obj = {}) =>
ps .reduce ((o, p) => (o || {}) [p], obj)
const assoc = (prop, val, obj) =>
isInt (prop) && Array .isArray (obj)
? [... obj .slice (0, prop), val, ...obj .slice (prop + 1)]
: {...obj, [prop]: val}
const assocPath = ([p = undefined, ...ps], val, obj) =>
p == undefined
? obj
: ps.length == 0
? assoc(p, val, obj)
: assoc(p, assocPath(ps, val, obj[p] || (obj[p] = isInt(ps[0]) ? [] : {})), obj)
const isEmpty = (x) =>
x == null || (typeof x == 'object' && Object.keys(x).length == 0)
function * getPaths(o, p = []) {
if (Object(o) !== o || Object .keys (o) .length == 0) yield p
if (Object(o) === o)
for (let k of Object .keys (o))
yield * getPaths (o[k], [...p, isInt (Number (k)) ? Number (k) : k])
}
// Helper functions
const canonicalPath = (path) =>
path.map (n => isInt (Number (n)) ? 0 : n)
const splitPaths = (xs) =>
Object .values ( xs.reduce (
(a, p, _, __, cp = canonicalPath (p), key = cp .join ('\u0000')) =>
({...a, [key]: a [key] || {canonical: cp, path: p} })
, {}
))
// Main function
const canonicalRep = (data) => splitPaths ([...getPaths (data)])
.reduce (
(a, {path: p, canonical}, _, __, val = path(p, data)) =>
isEmpty(val) && !isEmpty(path(canonical, a))
? a
: assocPath(canonical, val, a),
Array.isArray(data) ? [] : {}
)
// Test data
const data1 = [{"dog": "lmn", "tiger": [{"bengoltiger": {"height": {"x": 4}}, "indiantiger": {"foor": "b", "paw": "a"}}, {"bengoltiger": {"width": {"a": 8}}, "indiantiger": {"b": 3}}]}, {"dog": "pqr", "lion": 90, "tiger": [{"bengoltiger": {"width": {"m": 3}}, "indiantiger": {"foor": "b", "paw": "a"}}, {"bengoltiger": {"height": {"n": 8}}, "indiantiger": {"b": 3}}]}]
const data2 = [{"d": "Foreign Trade: Export/Import: Header Data", "a": "false", "f": [{"g": "TRANSPORT_MODE", "i": "2"}, {"k": "System.String", "h": "6"}], "l": "true"}, {"a": "false", "f": [], "l": "false", "z": []}]
const data3 = [data2[1], data2[0]]
// Demo
console .log (canonicalRep (data1))
console .log (canonicalRep (data2))
console .log (canonicalRep (data3))
.as-console-wrapper {max-height: 100% !important; top: 0}
Why not change assoc
?
This update grew out of discussion after I rejected an edit attempt to do the same sort of empty-checking inside assoc
. I rejected that as too far removed from the original attempt. When I learned what it was supposed to do, I knew that what had to be changed was canonicalRep
or one of its immediate helper functions.
The rationale is simple. assoc
is a general-purpose utility function designed to do a shallow clone of an object, changing the named property to the new value. This should not have complex logic regarding whether the value is empty. It should remain simple.
By introducing the isEmpty
helper function, we can do all this with only a minor tweak to canonicalRep
.
1That failure mode could happen if you had certain nodes containing that separator, \u0000
. For instance, if you had paths [...nodes, "abc\u0000", "def", ...nodes]
and [...nodes, "abc", "\u0000def", ...nodes]
, they would both map to "...abc\u0000\u0000def..."
. If this is a real concern, we could certainly use other forms of deduplication.