20

I have two object arrays:

var a = [
  {id: 4, name: 'Greg'},
  {id: 1, name: 'David'},
  {id: 2, name: 'John'},
  {id: 3, name: 'Matt'},
]

var b = [
  {id: 5, name: 'Mathew', position: '1'},
  {id: 6, name: 'Gracia', position: '2'},
  {id: 2, name: 'John', position: '2'},
  {id: 3, name: 'Matt', position: '2'},
]

I want to do an inner join for these two arrays a and b, and create a third array like this (if the position property is not present, then it becomes null):

var result = [{
  {id: 4, name: 'Greg', position: null},
  {id: 1, name: 'David', position: null},
  {id: 5, name: 'Mathew', position: '1'},
  {id: 6, name: 'Gracia', position: '2'},
  {id: 2, name: 'John', position: '2'},
  {id: 3, name: 'Matt', position: '2'},
}]

My approach:

function innerJoinAB(a,b) {
    a.forEach(function(obj, index) {
        // Search through objects in first loop
        b.forEach(function(obj2,i2){
        // Find objects in 2nd loop
        // if obj1 is present in obj2 then push to result.
        });
    });
}

But the time complexity is O(N^2). How can I do it in O(N)? My friend told me that we can use reducers and Object.assign.

I'm not able to figure this out. Please help.

Krisztián Balla
  • 19,223
  • 13
  • 68
  • 84
TechnoCorner
  • 4,879
  • 10
  • 43
  • 81
  • 2
    You have two arrays of objects. Seems like you need to copy all the values of one array to a new one, then merge the second (and subsequent) arrays into it. [*Array.prototype.reduce*](http://www.ecma-international.org/ecma-262/7.0/index.html#sec-array.prototype.reduce) might be a good start. What is the primary key, *id*? Since you're using an array to hold objects, likely you also want to create an index of ID to array index so you can easily find IDs without having to iterate over the array each time. – RobG Feb 24 '17 at 01:08
  • 6
    PS inner join might not be the right term as that from my understanding only gives a result set where there was a match in both sets (so your example would only give lines with IDs 2 & 3). This is more of a typical merge. – Marty Feb 24 '17 at 01:10
  • @NicholasSmith this is JS, not JSON – Hamms Feb 24 '17 at 01:30
  • Possible duplicate of [Merge two array of objects based on a key](https://stackoverflow.com/questions/46849286/merge-two-array-of-objects-based-on-a-key) – Heretic Monkey May 20 '19 at 20:25
  • 4
    What you want, based on your output example, is a full outer join, not an inner join. – svenema Mar 17 '20 at 12:17
  • *full outer join* - yes, the question is about a full outer join. See for example [Different Types of SQL JOINs](https://www.w3schools.com/sql/sql_join.asp), image: https://i.imgur.com/yhYDsI2.png. – Henke Apr 15 '21 at 15:39
  • I have a major problem with this question. - It asks for an inner join, but then goes on to describe a full outer join in the desired output. So which one is it? - The _inner join_ as stated? - Or the _full outer join_ as described by the `result` array / output example? – Henke Apr 28 '21 at 06:39

8 Answers8

14

I don't know how reduce would help here, but you could use a Map to accomplish the same task in O(n):

const a = [
  {id: 4, name: 'Greg'},
  {id: 1, name: 'David'},
  {id: 2, name: 'John'},
  {id: 3, name: 'Matt'}];

const b = [
  {id: 5, name: 'Mathew', position: '1'},
  {id: 6, name: 'Gracia', position: '2'},
  {id: 2, name: 'John', position: '2'},
  {id: 3, name: 'Matt', position: '2'}];

var m = new Map();
// Insert all entries keyed by ID into the Map, filling in placeholder
// 'position' since the Array 'a' lacks 'position' entirely:
a.forEach(function(x) { x.position = null; m.set(x.id, x); });

// For values in 'b', insert them if missing, otherwise, update existing values:
b.forEach(function(x) {
    var existing = m.get(x.id);
    if (existing === undefined)
        m.set(x.id, x);
    else
        Object.assign(existing, x);
});

// Extract resulting combined objects from the Map as an Array
var result = Array.from(m.values());

console.log(JSON.stringify(result));
.as-console-wrapper { max-height: 100% !important; top: 0; }

Because Map accesses and updates are O(1) (on average - because of hash collisions and rehashing, it can be longer), this makes O(n+m) (where n and m are the lengths of a and b respectively; the naive solution you gave would be O(n*m) using the same meaning for n and m).

Henke
  • 4,445
  • 3
  • 31
  • 44
ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • One issue with this: If position is set in array `a`, it will be lost if it is not duplicated in `b`. – Gerrit0 Feb 24 '17 at 01:38
  • @Gerrit0: Yar, I noted that assumption (that `a` always lacks a `position`) in the comments. You could easily adjust it to make the set of `x.position` when working on `a` conditional, but the OP's provided inputs indicated that `a` never had `position` and `b` always did. Similarly, this assumes that `id` is unique on its own (no need for `name` as part of the key, with the assumption that `name` will match if `id` does). – ShadowRanger Feb 24 '17 at 01:45
  • 1
    This looks like a left join to me, not an inner join. – phil Nov 02 '18 at 20:31
  • 1
    @phil: Agreed. The OP asked for inner join, but their desired output was a left join. I provided an answer that would produce their desired output, as it seems clear they were using the wrong terminology. – ShadowRanger Nov 03 '18 at 02:56
11

One of the ways how to solve it.

const a = [
  {id: 4, name: 'Greg'},
  {id: 1, name: 'David'},
  {id: 2, name: 'John'},
  {id: 3, name: 'Matt'},
];

const b = [
  {id: 5, name: 'Mathew', position: '1'},
  {id: 6, name: 'Gracia', position: '2'},
  {id: 2, name: 'John', position: '2'},
  {id: 3, name: 'Matt', position: '2'},
];

const r = a.filter(({ id: idv }) => b.every(({ id: idc }) => idv !== idc));
const newArr = b.concat(r).map((v) => v.position ? v : { ...v, position: null });

console.log(JSON.stringify(newArr));
.as-console-wrapper { max-height: 100% !important; top: 0; }
Henke
  • 4,445
  • 3
  • 31
  • 44
kind user
  • 40,029
  • 7
  • 67
  • 77
4

If you drop the null criteria (many in the community are saying using null is bad) then there's a very simple solution

let a = [1, 2, 3];
let b = [2, 3, 4];

a.filter(x => b.includes(x)) 

// [2, 3]
Stephen
  • 7,994
  • 9
  • 44
  • 73
3

To reduce the time complexity, it is inevitable to use more memory.

var a = [
  {id: 4, name: 'Greg'},
  {id: 1, name: 'David'},
  {id: 2, name: 'John'},
  {id: 3, name: 'Matt'},
]

var b = [
  {id: 5, name: 'Mathew', position: '1'},
  {id: 6, name: 'Gracia', position: '2'},
  {id: 2, name: 'John', position: '2'},
  {id: 3, name: 'Matt', position: '2'},
]     

var s = new Set();
var result = [];
b.forEach(function(e) {
    result.push(Object.assign({}, e));
    s.add(e.id);
});
a.forEach(function(e) {
    if (!s.has(e.id)) {
      var temp = Object.assign({}, e);
      temp.position = null;
      result.push(temp);
    }
});
console.log(result);

update

As @Blindman67 mentioned:"You do not reduce the problems complexity by moving a search into the native code." I've consulted the ECMAScript® 2016 Language Specification about the internal procedure of Set.prototype.has() and Map.prototype.get(), unfortunately, it seemed that they both iterate through all the elements they have.

Set.prototype.has ( value )#

The following steps are taken:

    Let S be the this value.
    If Type(S) is not Object, throw a TypeError exception.
    If S does not have a [[SetData]] internal slot, throw a TypeError exception.
    Let entries be the List that is the value of S's [[SetData]] internal slot.
    Repeat for each e that is an element of entries,
        If e is not empty and SameValueZero(e, value) is true, return true.
    Return false. 

http://www.ecma-international.org/ecma-262/7.0/#sec-set.prototype.has

Map.prototype.get ( key )#

The following steps are taken:

    Let M be the this value.
    If Type(M) is not Object, throw a TypeError exception.
    If M does not have a [[MapData]] internal slot, throw a TypeError exception.
    Let entries be the List that is the value of M's [[MapData]] internal slot.
    Repeat for each Record {[[Key]], [[Value]]} p that is an element of entries,
        If p.[[Key]] is not empty and SameValueZero(p.[[Key]], key) is true, return p.[[Value]].
    Return undefined. 

http://www.ecma-international.org/ecma-262/7.0/#sec-map.prototype.get

Perhaps, we can use the Object which can directly access its properties by their names, like the hash table or associative array, for example:

var a = [
  {id: 4, name: 'Greg'},
  {id: 1, name: 'David'},
  {id: 2, name: 'John'},
  {id: 3, name: 'Matt'},
]

var b = [
  {id: 5, name: 'Mathew', position: '1'},
  {id: 6, name: 'Gracia', position: '2'},
  {id: 2, name: 'John', position: '2'},
  {id: 3, name: 'Matt', position: '2'},
]     

var s = {};
var result = [];
b.forEach(function(e) {
    result.push(Object.assign({}, e));
    s[e.id] = true;
});
a.forEach(function(e) {
    if (!s[e.id]) {
      var temp = Object.assign({}, e);
      temp.position = null;
      result.push(temp);
    }
});
console.log(result);
Community
  • 1
  • 1
Yichong
  • 707
  • 4
  • 10
  • You misunderstand the spec. `Set` is describing basic logic in the docs, not an actual implementation strategy. `get` and `has` are required to be sublinear, so at worst they're `O(log n)` and the suggested implementation is a hash table, `O(n)`. Read earlier in [the overall `Set` docs](http://www.ecma-international.org/ecma-262/7.0/#sec-set-objects): – ShadowRanger Feb 24 '17 at 11:43
  • "Set objects must be implemented using either hash tables or other mechanisms that, on average, provide access times that are sublinear on the number of elements in the collection. The data structures used in this Set objects specification is only intended to describe the required observable semantics of Set objects. It is not intended to be a viable implementation model." – ShadowRanger Feb 24 '17 at 11:43
  • One question @Y.C Why did you use result.push(Object.assign({}, e)); why can't we just push result.push(e); it will give the same result? – TechnoCorner Feb 25 '17 at 00:47
  • @TechnoCorner `Object.assign()` offers a shallow clone, it copies the values of all enumerable own properties from one or more source objects to a target object. So if you changed the objects in the `result` array, it won't affect the original one, because they're different objects with same content. But if you don't mind making changes to the the original objects, in that case, obviously `result.push(e)` is more efficient. – Yichong Mar 01 '17 at 04:23
  • @ShadowRanger So, you can't be sure which browser is using which implementation or mechanism, but you know that objects in JS can be used as hash table or associative array. – Yichong Mar 01 '17 at 04:40
  • @Y.C.: Pretty sure the exact implementation of JS objects isn't required to be backed by a hash table either. They must act like associative arrays, but that doesn't actually put constraints on implementation (and in fact, after JIT-ing, I strongly suspect they avoid relatively slow string hashing for cases where objects are being used as real objects with a set of fixed attributes, rather than glorified string keyed hash tables). Point is, you can't be sure either way, but using an actual data structure for its intended purpose is usually a good starting point. – ShadowRanger Mar 01 '17 at 04:52
  • @ShadowRanger `obj.property` is most commonly used in JS, if every time you use the `.` operator, a search must be made, that would be terrible, almost impossible. Therefore, I'm pretty sure that `.` operator has the ability to directly access the properties in an object. – Yichong Mar 01 '17 at 05:10
  • @Y.C.: Given that the vast majority of objects would have only a few dozen properties at most, binary search at `O(log n)` would only be a handful of tests (e.g. ~6 tests for 32 properties); in practice, due to collisions, hash tables have to perform a handful of tests in most cases anyway, so the gap would not be that large. Mind you, I don't think they actually did that, but then, I don't think anyone implemented Map or Set in terms of anything but a hash table either; you can't attack one approach on the basis of a gut feeling and not subject your own approach to the same criticism. – ShadowRanger Mar 01 '17 at 05:17
  • @ShadowRanger As far as I know, a hash table uses a hash function to compute an index which can be used as a entry point of the desired value. In most of cases there won't be collisions, even when the collision arises, according to your description, an object with 32 properties, it does need a few tests, but one or two tests at most, unless the hash algorithm is really bad. The only cost of searching a hash table is that you have to compute the index. – Yichong Mar 01 '17 at 07:01
1

You do not reduce the problems complexity by moving a search into the native code. The search must still be done.

Also the addition of the need to null a undefined property is one of the many reasons I dislike using null.

So without the null the solution would look like

var a = [
  {id: 4, name: 'Greg',position: '7'},
  {id: 1, name: 'David'},
  {id: 2, name: 'John'},
  {id: 3, name: 'Matt'},
]

var b = [
  {id: 5, name: 'Mathew', position: '1'},
  {id: 6, name: 'Gracia', position: '2'},
  {id: 2, name: 'John', position: '2'},
  {id: 3, name: 'Matt', position: '2'},
]


function join (indexName, ...arrays) {
    const map = new Map();
    arrays.forEach((array) => {
        array.forEach((item) => {
            map.set(
                item[indexName],
                Object.assign(item, map.get(item[indexName]))
            );
        })
    })
    return [...map.values()];
}

And is called with

const joinedArray = join("id", a, b);

To join with a default is a little more complex but should prove handy as it can join any number of arrays and automatically set missing properties to a provided default.

Testing for the defaults is done after the join to save a little time.

function join (indexName, defaults, ...arrays) {
    const map = new Map();
    arrays.forEach((array) => {
        array.forEach((item) => {
            map.set(
                item[indexName], 
                Object.assign( 
                    item, 
                    map.get(item[indexName])
                )
            );
        })
    })
    return [...map.values()].map(item => Object.assign({}, defaults, item));

}

To use

const joinedArray = join("id", {position : null}, a, b);

You could add...

    arrays.shift().forEach((item) => {  // first array is a special case.
        map.set(item[indexName], item);
    });

...at the start of the function to save a little time, but I feel it's more elegant without the extra code.

Blindman67
  • 51,134
  • 11
  • 73
  • 136
0

Here is an attempt at a more generic version of a join which accepts N objects and merges them based on a primary id key.

If performance is critical, you are better off using a specific version like the one provided by ShadowRanger which doesn't need to dynamically build a list of all property keys.

This implementation assumes that any missing properties should be set to null and that every object in each input array has the same properties (though properties can differ between arrays)

var a = [
    {id: 4, name: 'Greg'},
    {id: 1, name: 'David'},
    {id: 2, name: 'John'},
    {id: 3, name: 'Matt'},
];
var b = [
    {id: 5, name: 'Mathew', position: '1'},
    {id: 600, name: 'Gracia', position: '2'},
    {id: 2, name: 'John', position: '2'},
    {id: 3, name: 'Matt', position: '2'},
];

console.log(genericJoin(a, b));

function genericJoin(...input) {
    //Get all possible keys
    let template = new Set();
    input.forEach(arr => {
        if (arr.length) {
            Object.keys(arr[0]).forEach(key => {
                template.add(key);
            });
        }
    });

    // Merge arrays
    input = input.reduce((a, b) => a.concat(b));

    // Merge items with duplicate ids
    let result = new Map();
    input.forEach(item => {
        result.set(item.id, Object.assign((result.get(item.id) || {}), item));
    });

    // Convert the map back to an array of objects
    // and set any missing properties to null
    return Array.from(result.values(), item => {
        template.forEach(key => {
            item[key] = item[key] || null;
        });
        return item;
    });
}
Gerrit0
  • 7,955
  • 3
  • 25
  • 32
0

Here's a generic O(n*m) solution, where n is the number of records and m is the number of keys. This will only work for valid object keys. You can convert any value to base64 and use that if you need to.

const join = ( keys, ...lists ) =>
    lists.reduce(
        ( res, list ) => {
            list.forEach( ( record ) => {
                let hasNode = keys.reduce(
                    ( idx, key ) => idx && idx[ record[ key ] ],
                    res[ 0 ].tree
                )
                if( hasNode ) {
                    const i = hasNode.i
                    Object.assign( res[ i ].value, record )
                    res[ i ].found++
                } else {
                    let node = keys.reduce( ( idx, key ) => {
                        if( idx[ record[ key ] ] )
                            return idx[ record[ key ] ]
                        else
                            idx[ record[ key ] ] = {}
                        return idx[ record[ key ] ]
                    }, res[ 0 ].tree )
                    node.i = res[ 0 ].i++
                    res[ node.i ] = {
                        found: 1,
                        value: record
                    }
                }
            } )
            return res
        },
        [ { i: 1, tree: {} } ]
         )
         .slice( 1 )
         .filter( node => node.found === lists.length )
         .map( n => n.value )

join( [ 'id', 'name' ], a, b )

This is essentially the same as Blindman67's answer, except that it adds an index object to identify records to join. The records are stored in an array and the index stores the position of the record for the given key set and the number of lists it's been found in.

Each time the same key set is encountered, the node is found in the tree, the element at it's index is updated, and the number of times it's been found is incremented.

finally, the idx object is removed from the array with the slice, any elements that weren't found in each set are removed. This makes it an inner join, you could remove this filter and have a full outer join.

finally each element is mapped to it's value, and you have the merged array.

Snowbldr
  • 776
  • 6
  • 11
0

From extensive research I've done on this, no way to reduce the joining of two lists beyond O(n*m)

The classical solution that I understand most databases use, is to create an index from the smaller list and then do a scan of that index. This is essentially just pushing the O(n*m) "work" down the interpreter chain as far as possible. ie, your OS/processor API probably has a very optimized way of orchestrating list compares so you get a performance boost from them doing the job. This technically makes it O(n*m + n) but it should still be most efficient.

var a = [
  {id: 4, name: 'Greg'},
  {id: 1, name: 'David'},
  {id: 2, name: 'John'},
  {id: 3, name: 'Matt'},
]

var b = [
  {id: 5, name: 'Mathew', position: '1'},
  {id: 6, name: 'Gracia', position: '2'},
  {id: 2, name: 'John', position: '2'},
  {id: 3, name: 'Matt', position: '2'},
]

const idx = a.reduce(prev, _ => {
    prev[_.id] = _
}, {})

const result = b.reduce(prev, _ => {
  if( idx[_] !== undefined ){
    prev.push([_, idx[_.id])
}, [])

Again, as far as I understand, this is the classic solution to this probably. Would love to be wrong.

Jamie Marshall
  • 1,885
  • 3
  • 27
  • 50