How to extract all possible matching arrays of objects from one array of objects?

Question

I have an array of objects, e.g.

var arr = [
    {"a": "x"},
    {"b": "0"},
    {"c": "k"},
    {"a": "nm"},
    {"b": "765"},
    {"ab": "i"},
    {"bc": "x"},
    {"ab": "4"},
    {"abc": "L"}
];

Let's say I am only interested in objects whose keys correspond to var input = ["ab", "bc"]. It means that I want to extract all possible subarrays with result[i].length == 2 in the following way:

var result = [
    [{"ab": "i"}, {"bc": "x"}],
    [{"ab": "4"}, {"bc": "x"}] // or [{"bc": "x"}, {"ab": "4"}]
];

— that is, the order of objects in subarrays is absolutely not important: I am only interested in the fact that each subarray contains two objects — {"ab": ...} and {"bc": ...}.

If I was interested in var input = ["a","a","ab"], the result should be like this:

var result = [
    [{"a": "x"}, {"a": "nm"}, {"ab": "i"}],
    [{"a": "x"}, {"a": "nm"}, {"ab": "4"}]
];

I cannot find the way to achieve the desired result (assuming that input.length may be much greater than 2 or 3 — even 15–20 may be not enough) without factorial-level amount of computations, which is not physically possible. Is there a way to have some reasonable performance for solving such a problem?
Important note: yes, obviously, for relatively large values of input.length there theoretically may be possible to have very huge numbers of possible combinations, but in practice, result.length will always be reasonably small (maybe 100–200, I even doubt that it could reach 1000...). But for safety, I would want to just set some limit (say, 1000), such that as soon as result.length reaches this limit, the function just returns the current result and stops.

@DavinTryon: Step 1. check if `arr` contains `{"ab":value}`. If yes, get the next `{"bc":value}` and put them both in `result`. Step 2. check if `arr` contains `{"bc":value}`. If yes, get the next `{"ab":value}` and put them both in `result`. And so on, which requires a factorial-level number of possible situations. — lyrically wicked, Oct 03 '16 at 07:33
Overcomplicated. IMO you should change your data model so you would not have problems with data filtering / converting. — DamianoPantani, Oct 03 '16 at 08:15
Could you elaborate how and why your method should produce the example output for `["a", "a", "ab"]`? How should the "algorithm" decide if a value is part of the first *a* or the latter? Scan the `input` first and then decide that there's more than 1 *a*, the latter should receive the rest? Or were you perhaps actually looking for the product of found objects for each key? — Ilja Everilä, Oct 03 '16 at 08:39
@Ilja Everilä: How should the "algorithm" decide if a value is part of the first a or the latter? Scan the input first and then decide that there's more than 1 a, the latter should receive the rest? // The fact that there can be duplicated strings in the input does not matter at all. Is `result[i+1]`different from `result[i]`? Yes. That's what matters. — lyrically wicked, Oct 03 '16 at 08:51
Is `[{"a": "nm"}, {"a": "x"}, {"ab": "4"}]` not "unique" when compared to `[{"a": "x"}, {"a": "nm"}, {"ab": "4"}]` and `[{"a": "x"}, {"a": "nm"}, {"ab": "i"}]`, or are you not interested in order? What should the output be, if there were more than 2 objetcs with key *a*? Are you looking for the set of sets of filtered values? — Ilja Everilä, Oct 03 '16 at 09:11
@IljaEverilä: are you not interested in order? // Yes, I mentioned this in the question. I am only interested that the function returns **at least two subarrays with unique data** — note that if you sort `[{"a": "nm"}, {"a": "x"}, {"ab": "4"}]` and `[{"a": "x"}, {"a": "nm"}, {"ab": "4"}]`, they will represent exactly the same data and become duplicates. You can then find all duplicates of `[{"a": "x"}, {"a": "nm"}, {"ab": "i"}]`, but in any case, you cannot extract more than 2 arrays with **unique data** from the given source. — lyrically wicked, Oct 04 '16 at 04:40
So you want a set of sets with rows containing duplicates removed. There's 1 edge still that is unclear in your post: what is the result, if input is `["a", "a", "ab"]` and `arr` contains only 1 "a" object. Would that be an empty set, since only `[{"a": "x"}, {"a": "x"}, ...]` could be produced. — Ilja Everilä, Oct 04 '16 at 05:21
@IljaEverilä: what is the result, if input is ["a", "a", "ab"] and arr contains only 1 "a" object // Obviously, such situations are absolutely guaranteed to be excluded: we could always check that `arr` contains at lest two `{"a": ...}` objects and one `{"ab": ...}` object. But in practice, I won't even need such additional check. And the specific details of what could be the result if `arr` doesn't match `input` are not important. We can just return an empty array. — lyrically wicked, Oct 04 '16 at 06:31
You seem to misunderstand the nature of a Q/A site. Nothing that you do not explicitly state is obvious and hinders producing quality answers. — Ilja Everilä, Oct 04 '16 at 06:39

gsamaras · Answer 1 · 2016-10-03T08:30:31.680

Sort alphabetically arr and input, which is O(nlogn), and if you are able to make that as you build the arrays, you may be benefited.

Let me explain my idea with an example:

var arr = [
    {"a": "x"},
    {"ab": "i"},
    {"ab": "4"},
    {"abc": "L"}
    {"bc": "x"},
];
var input = ["ab", "bc"];

Search for input[0] in arr (linearly or even with binary search to speed it up). Mark the index.

Search for input[1] in arr, but consider only the subarray of arr, from the index marked in the previous step, to the end of it.

If you find all the elements of input, then push that to the results (you can keep a temporary object for that).

Now, we have to search again for input[0], since it may be that two or more entries share that key. You will have stored that index I mentioned before, so that you will start searching again from this index, and since arr is sorted, you would have to check only the very next element and so on.

Time complextiy:

Sort your arrays (assuming you couldn't have them sorted when you built them): O(nlogn), where n is the number of elements arr has.

Binary search in arr for input[0]: O(logn)

Now the next step of search (for input[1]) is much less than the length of arr, so a very pessimistic bound would be O(n). In practise it won't be O(n) of course, and if you want you can do a binary search for input[1] too, which would cost O(logm), where m is the size of the arr[index_stored: -1].

At this point, we move on to to finding the next occurence of input[0], if any of course, but because we have stored the index we know exactly where to start searching, and we have to check the next element only, that's a constant cost, thus O(1).

And then we do the same for input[1] as above, which is cheap again.

Now, it all depends on the length of input, which is k, and it seems that k < n, and how many occurrences of a key you have, right?

But assuming a normal-avergae situation, the whole procedure has a time complextiy of:

O(nlogn)

However, notice that you have to pay a bit of extra memory to store the indices, which is subject to the number of occurences a key has. With a brute force aglrotihm, which would be slower, you wouldn't need to pay anything extra for memory.

Can you estimate the time required to get the result (assuming that its length has some limit as described in the question) when `arr` contains, say, at least 30 different objects with random keys, and the length of `input` is, say, at least 10? — lyrically wicked, Oct 03 '16 at 08:03
@lyricallywicked the theoretical analysis is subject of many assumptions, I updated with a very simple one. Hope that helps. I mean what you are asking is a fairly new question! :) — gsamaras, Oct 03 '16 at 08:31

Vladimir M · Answer 2 · 2016-10-03T08:56:23.080

Perhaps not the most optimal way. I'd probably use some library for the final solution, but here is a number of steps that would do the trick for a happy path. I'll add a bit of comments shortly.

Generate a map for a single key in the source array (i.e. at which indexes it is seen, as we may have multiple entries)

   function getKeyMap( src, key ){
        var idx_arr = [];
        src.forEach(function(pair,idx){ if(Object.keys(pair)[0] === key){ idx_arr.push(idx)} });
        return idx_arr;
    }

And this mapping has to be done for all the keys you want to be part of filtering

function getKeysMap( src, keys ){
    var keys_map = [];
    keys.forEach(function(aKey){
        var aMap = getKeyMap(src,aKey);
        if( aMap.length ){
            keys_map.push(aMap);
        }

    });
    // if keys map lenght is less then keys length then you should throw an exception or something
    return keys_map;
}

Then you want to build all the possible combinations. I use recursion here, in perhaps not the most optimal way

function buildCombos( keys_map, carry, result ){
    if( keys_map.length === 0){
        result.push(carry);
        return;
    }
    var iter = keys_map.pop();
    iter.forEach(function(key){
        var cloneMap = keys_map.slice(0);
        var clone = carry.slice(0);
        clone.push(key);
        buildCombos(cloneMap, clone, result);
    });
}

Then I need to filter the result to exclude double entries, and entries with the repeating indices

function uniqueFilter(value, index, self) {
    return self.indexOf(value) === index;
}

function filterResult( map ){
    var filter = {};
    map.forEach(function(item){
        var unique = item.filter( uniqueFilter );
        if(unique.length === item.length){
            filter[unique.sort().join('')]=true;}
        });
    return filter;
}

And then I simply decode the resulting filtered map into the original data

function decodeMap( map,src ){
    var result = [];
    Object.keys(map).forEach(function(item){
        var keys = item.split('');
        var obj = [];
        keys.forEach(function( j ){
            obj.push( src[j])
        });
        result.push(obj);
    });
    return result;
}

The wrapper

function doItAll(arr, keys){
    // Get map of they keys in terms of numbers
    var maps = getKeysMap( arr, keys);
    // build combinations out of key map
    var combos = [];
    buildCombos(maps,[],combos);
    // filter results to get rid of same sequences and same indexes ina sequence
    var map = filterResult(combos);
    // decode map into the source array
    return decodeMap( map, arr )
}

Usage:

var res = doItAll(arr, ["a","a","ab"])

acontell · Accepted Answer · 2016-10-04T06:57:04.783

Seeing the problem, it kind of look like a cartessian product. In fact, if before operating, the data model is modified a bit, the expected result is, in almost all cases, a cartessian product. However, there's a case (the second example you provided) that needs special treatment. Here's what I did:

Tweak a bit the model data (this will be done only once) in order to have something suitable to apply cartessian product.
Treat the "special case" of having more than one parameter requesting the same string.

All the important logic is within cartessianProdModified. The important bits in the code are commented. Hope it helps you with your problem or at least gives you some ideas.

Here's a fiddle and here's the code:

var arr = [
    {"a": "x"},
    {"b": "0"},
    {"c": "k"},
    {"a": "nm"},
    {"b": "765"},
    {"ab": "i"},
    {"bc": "x"},
    {"ab": "4"},
    {"abc": "L"},
    {"dummy": "asdf"}
];

// Utility function to be used in the cartessian product
function flatten(arr) {
    return arr.reduce(function (memo, el) {
        return memo.concat(el);
    }, []);
}

// Utility function to be used in the cartessian product
function unique(arr) {
    return Object.keys(arr.reduce(function (memo, el) {
        return (memo[el] = 1) && memo;
    }, {}));
}

// It'll prepare the output in the expected way
function getObjArr(key, val, processedObj) {
    var set = function (key, val, obj) {
        return (obj[key] = val) && obj;
    };
    // The cartessian product is over so we can put the 'special case' in an object form so that we can get the expected output.
    return val !== 'repeated' ? [set(key, val, {})] : processedObj[key].reduce(function (memo, val) {
        return memo.concat(set(key, val, {}));
    }, []);
}

// This is the main function. It'll make the cartessian product.
var cartessianProdModified = (function (arr) {
    // Tweak the data model in order to have a set (key: array of values)
    var processedObj = arr.reduce(function (memo, obj) {
        var firstKey = Object.keys(obj)[0];
        return (memo[firstKey] = (memo[firstKey] || []).concat(obj[firstKey])) && memo;
    }, {});

    // Return a function that will perform the cartessian product of the args.
    return function (args) {
        // Spot repeated args.
        var countArgs = args.reduce(function (memo, el) {
                return (memo[el] = (memo[el] || 0) + 1) && memo;
            }, {}),
            // Remove repeated args so that the cartessian product works properly and more efficiently.
            uniqArgs = unique(args);

        return uniqArgs
                .reduce(function (memo, el) {
                    return flatten(memo.map(function (x) {
                        // Special case: the arg is repeated: we have to treat as a unique value in order to do the cartessian product properly
                        return (countArgs[el] > 1 ? ['repeated'] : processedObj[el]).map(function (y) {
                            return x.concat(getObjArr(el, y, processedObj));
                        });
                    }));
                }, [[]]);
    };
})(arr);

console.log(cartessianProdModified(['a', 'a', 'ab']));

Can you modify the resulting `cartessianProdModified(str1, str2, str3...)` function such that it accepts two arguments (the first one being the source of data (`arr`), and the second one should be the input)? Another option: it accepts one argument, which is an array of strings (`input`)? (I don't know what option would be better, I just need the function to accept an array of strings as input, but not multiple strings as separate arguments) — lyrically wicked, Oct 04 '16 at 05:50
@lyricallywicked Sure, I've updated the function so that it can take an array of strings instead of arguments. I think this approach is better than the other because this way changing the original array into the new form is done only once. Anyway, it wouldn't be any problem changing it to accept the array of data too. Thanks. — acontell, Oct 04 '16 at 07:01

Ilja Everilä · Answer 4 · 2016-10-04T07:41:44.897

If you are able to use ES6 features, you can use generators to avoid having to create large intermediate arrays. It would seem that you want a set-of-sets of sorts, with rows containing only unique values. As others have also mentioned, you can approach this by starting with a cartesian product of objects matching your input keys:

'use strict';

function* product(...seqs) {
    const indices = seqs.map(() => 0),
          lengths = seqs.map(seq => seq.length);

    // A product of 0 is empty
    if (lengths.indexOf(0) != -1) {
        return;
    }

    while (true) {
        yield indices.map((i, iseq) => seqs[iseq][i]);
        // Update indices right-to-left
        let i;
        for (i = indices.length - 1; i >= 0; i--) {
            indices[i]++;
            if (indices[i] == lengths[i]) {
                // roll-over
                indices[i] = 0;
            } else {
                break;
            }
        }
        // If i is negative, then all indices have rolled-over
        if (i < 0) {
            break;
        }
    }
}

The generator only holds the indices in between iterations and generates new rows on demand. To actually join the objects that match your input keys, you first have to for example create a lookup:

function join(keys, values) {
    const lookup = [...new Set(keys)].reduce((o, k) => {
        o[k] = [];
        return o;
    }, {});

    // Iterate over array indices instead of objects them selves.
    // This makes producing unique rows later on a *lot* easier.
    for (let i of values.keys()) {
       const k = Object.keys(values[i])[0];
       if (lookup.hasOwnProperty(k)) {
           lookup[k].push(i);
       } 
    }

    return product(...keys.map(k => lookup[k]));
}

You then need to filter out rows containing duplicate values:

function isUniq(it, seen) {
    const notHadIt = !seen.has(it);
    if (notHadIt) {
        seen.add(it);
    }
    return notHadIt;
}

function* removeDups(iterable) {
    const seen = new Set();
    skip: for (let it of iterable) {
        seen.clear();
        for (let x of it) {
            if (!isUniq(x, seen)) {
                continue skip;
            }
        }
        yield it;
    }
}

And also globally unique rows (the set-of-sets aspect):

function* distinct(iterable) {
    const seen = new Set();
    for (let it of iterable) {
        // Bit of a hack here, produce a known order for each row so
        // that we can produce a "set of sets" as output. Rows are
        // arrays of integers.
        const k = it.sort().join();
        if (isUniq(k, seen)) {
            yield it;
        }
    }
}

To tie it all up:

function* query(input, arr) {
    for (let it of distinct(removeDups(join(input, arr)))) {
        // Objects from rows of indices
        yield it.map(i => arr[i]);
    }
}

function getResults(input, arr) {
    return Array.from(query(input, arr));
}

In action:

const arr = [
    {"a": "x"},
    {"b": "0"},
    {"c": "k"},
    {"a": "nm"},
    {"b": "765"},
    {"ab": "i"},
    {"bc": "x"},
    {"ab": "4"},
    {"abc": "L"}
];

console.log(getResults(["a", "a", "ab"], arr));
/*
[ [ { a: 'x' }, { a: 'nm' }, { ab: 'i' } ],
  [ { a: 'x' }, { a: 'nm' }, { ab: '4' } ] ]
*/

And the obligatory jsFiddle.

score 0 · Answer 5 · edited May 23 '17 at 10:30

0

You can do it manually with loops, but you can also use the built-in functions Array.prototype.filter() to filter the array and Array.prototype.indexOf to check if an element is inside another array:

var filtered = arr.filter(function(pair){
    return input.indexOf(Object.keys(pair)[0]) != -1;
});

This gives you array with just the objects that match your criteria.

Now the thing with the result array in math language is called "combinations". This is exactly what you want, so I won't describe it here. A way to generate all combinations of array (set) is given here - https://stackoverflow.com/a/18250883/3132718

So here is how to use this function:

// function assumes each element is array, so we need to wrap each one in an array 
for(var i in filtered) {
    filtered[i] = [filtered[i]];
}
var result = getCombinations(filtered, input.length /* how many elements in each sub-array (subset) */);

Object.keys(pair)[0] is a way to get the first key of an object without iterating (https://stackoverflow.com/a/28670472)

edited May 23 '17 at 10:30

Community

1
1

answered Oct 03 '16 at 07:32

Al.G.

4,327
6
31
56

And how should I use it? How do I get the desired `result` (as described in the question), e.g., for the above `arr` and `input = ["a","a","ab"]`? – lyrically wicked Oct 03 '16 at 07:42
What is `getCombinations`? Is this some function with O(n!)? If yes, it would be unacceptable. – lyrically wicked Oct 03 '16 at 08:09
Look at the implementation (link in answer). In your case it can be simplified as your elements are not arrays but "single" objects. – Al.G. Oct 03 '16 at 08:14
The problem is that this `getCombinations(filtered, input.length)` requires at least `factorialOf(input.length)` computations. – lyrically wicked Oct 04 '16 at 04:33

How to extract all possible matching arrays of objects from one array of objects?

5 Answers5