0

I have a JSON file

{
    "data": [
        {
            "name": "Jake",
            "id": "123"
        },
        {
            "name": "Bob",
            "id": "234"
        }]
}

with all id's unique, and say I have an array of banned ids ["123","423"] and I would like to delete all entries that have an id number in the array (so as an output I'd like the following).

{
    "data": [
        {
            "name": "Bob",
            "id": "234"
        }]
}

What would be a moderately efficient way (runs in a few seconds on an ordinary computer) to achieve this if there's a few thousand entries in the JSON and array?

a13
  • 1
  • I think you are looking for something similar to this: http://stackoverflow.com/questions/1573593/whats-the-fastest-way-to-iterate-over-an-objects-properties-in-javascript – BuddhistBeast Jul 19 '16 at 02:20
  • Possible duplicate of [Filter collection (array of objects) based on other array](http://stackoverflow.com/questions/19131187/filter-collection-array-of-objects-based-on-other-array) – Steve Ladavich Jul 19 '16 at 02:28
  • Another similar question but specifically geared toward solutions with lodash/underscore: [lodash Filter collection using array of values](http://stackoverflow.com/questions/17251764/lodash-filter-collection-using-array-of-values) – Steve Ladavich Jul 19 '16 at 03:13

5 Answers5

2

You can use the Array.prototype.filter() method in conjunction with .indexOf():

var bannedIds = ["123", "423"];
var input = {
    "data": [
        {
            "name": "Jake",
            "id": "123"
        },
        {
            "name": "Bob",
            "id": "234"
        }]
};

input.data = input.data.filter(function(v) {
  return bannedIds.indexOf(v.id) === -1;
});

console.log(input);

If you don't want to overwrite the original array then just assign the result of the .filter() call to a new variable.

If the above turns out to be too slow with your large amount of data, you can try replacing .filter() with a conventional for loop, and/or replacing .indexOf() with a lookup object created from the array of banned ids.

nnnnnn
  • 147,572
  • 30
  • 200
  • 241
  • Note it's slow due to being `O(nm)`. Converting to `for` won't be that much of a performance improvement. – Spencer Wieczorek Jul 19 '16 at 02:29
  • I'd say it's O(nm), where n is the length of the original data and m is the length of the banned list. For a short banned list, this may be fine. – user94559 Jul 19 '16 at 02:29
  • @SpencerWieczorek - Well it will save a bunch of function calls, which can make a noticeable difference for huge amounts of data. Also I realised that replacing `.indexOf()` with a loop isn't as helpful as creating an object with the keys made from the banned id array, so I've edited the answer to mention that. (I won't include full code for that, because that is basically the same as the other answer with a `Set`.) – nnnnnn Jul 19 '16 at 02:30
1

If you can use ES6, you can do this:

const source = {
    "data": [
        {
            "name": "Jake",
            "id": "123"
        },
        {
            "name": "Bob",
            "id": "234"
        }
    ]
};
const banned = ["123", "423"];

// O(n) startup cost for constant access time later
const bannedSet = new Set(banned);

// O(n)
const result = source.data.filter(x => !bannedSet.has(x.id));

console.log(result);

As mentioned in the comments, there's a startup cost for creating the Set. However, this lets you then call Set.prototype.has, which is constant.

Then, it's just a matter of iterating over every element and filtering out the ones that are in the banned set.

If you can't use ES6, you could replace Set with a plain JS object. If you have to support IE<9, use a polyfill for Array.prototype.filter (thanks @nnnnnn).

UPDATE

@SpencerWieczorek points out that the ES6 spec seems to indicate that Set.prototype.has iterates. I spoke too soon about the lookup being constant (I was carrying over my experience from other languages). Typically, sets will do better than O(n), e.g. constant or O(log n) depending on the underlying implementation. Your mileage may vary, so nnnnnn's answer may be faster in some cases.

Try a few of the solutions here with large amounts of data to confirm.

Community
  • 1
  • 1
Frank Tan
  • 4,234
  • 2
  • 19
  • 29
  • 1
    Nice. Note that `.filter()` is an ES5 method (though a polyfill is still required if the OP needs to support IE < 9). – nnnnnn Jul 19 '16 at 02:34
  • @nnnnnn Thanks! I remembered the browser compatibility incorrectly. – Frank Tan Jul 19 '16 at 02:36
  • 1
    Are you sure `Set.prototype.has` is constant? [It seems to iterate according to this](http://www.ecma-international.org/ecma-262/6.0/#sec-set.prototype.has). – Spencer Wieczorek Jul 19 '16 at 02:36
  • @SpencerWieczorek I'm actually not sure. Perhaps the implementation is dependent on the browser. I'll update my answer accordingly. – Frank Tan Jul 19 '16 at 02:41
0

EDIT

I shied away from using filter or the like because that involves creating a new array. That's actually probably fine for the data sizes we're talking about, but the approach I have below is more efficient.


On my laptop, this whole program runs in about 0.2 seconds. (It uses 10,000 entries and 100 banned IDs.)

var o = {
    data: []
};

for (var i = 0; i < 10000; i++) {
    o.data.push({
        name: i % 2 === 0 ? 'Jake' : 'Bob', // couldn't think of more names :-)
        id: ''+i // convert to string
    });
}

var banned = {};

for (var i = 0; i < 100; i++) {
    banned[''+(i * 3)] = true; // ban 0, 3, 6, 9, 12, ...
}

for (var i = o.data.length - 1; i >= 0; i--) {
    if (banned[o.data[i].id]) {
        o.data.splice(i, 1);
    }
}

console.log(o);

// { data:
//    [ { name: 'Bob', id: '1' },
//      { name: 'Jake', id: '2' },
//      { name: 'Jake', id: '4' },
//      { name: 'Bob', id: '5' },
//      { name: 'Bob', id: '7' },
//      { name: 'Jake', id: '8' },
//      { name: 'Jake', id: '10' },
//      ...
user94559
  • 59,196
  • 6
  • 103
  • 103
0

I am assuming that you have already parsed the JSON data and you have a variable pointing to the array you want to filter. Also, you have an array with the "banned" IDs.

var data = [{
        "name": "Jake",
        "id": "123"
    }, {
        "name": "Bob",
        "id": "234"
    }, {
        "name": "Joe",
        "id": "345"
    }];

var banned = ["123", "345"];

The following function wil probably do the best job that can be done in terms of performance:

// Modifies the data array "in place", removing all elements
// whose IDs are found in the "banned" array
function removeBanned(data, banned) {
    // Index the "banned" IDs by writing them as the properties
    // of a JS object for really quick read access later on
    var bannedObj = {};
    banned.forEach(function(b) { bannedObj[b] = true; });

    var index = data.length - 1;

    while (index >= 0) {
        if (bannedObj[data[index].id]) {
            data.splice(index, 1);
        }
        --index;
    }
}
Adrian Theodorescu
  • 11,664
  • 2
  • 23
  • 30
0

This one seems fast enough, but I'd suggest you make a free clean copy instead of modifying the existing array, - it may be faster.

function filterout(o,p,f) {
  var i = 0; f = f.join(); 
  while( o[i] ) {
    if( f.match( o[i][p] ) ){ o.splice(i,1) }
    i++ 
  };
}

var filter = ["123","423"];

var object =
    {
    "data": [
        {
            "name": "John",
            "id": "723"
        },
        {
            "name": "Jake",
            "id": "123"
        },
        {
            "name": "Bob",
            "id": "234"
        }]
};

filterout( object.data, "id", filter );

console.log(JSON.stringify( object ));
Bekim Bacaj
  • 5,707
  • 2
  • 24
  • 26
  • `.pop()` doesn't take any arguments, it removes the last element in the array. – nnnnnn Jul 19 '16 at 03:27
  • Test this code with an array element that has an "id" of "42". It will get removed because you're doing `"123,423".match("42")`, which evaluates to `true`. – user94559 Jul 19 '16 at 05:44