0

I didn't find anything around. What I'm trying to achieve is to filter out an [n x 3] array based on another array m x 1

Let's say that:

arr1 = [n x 3] 'where n is roughly 500'000
arr2 = [m x 1] 'where m is roughly 500

arr1 it's structured like this:

arr1(1,1) = ID1 | arr1(1,2) = String1 | arr1(1,3) = Bolean1
arr1(2,1) = ID2 | arr1(2,2) = String2 | arr1(2,3) = Bolean2

arr2 it's structured like this:

arr2 (1) = ID2 | arr2 (2) = ID23 | arr2 (3) = ID345 | arr2 (4) = ID4567 | arr2 (5) = ID6381

What I need to do is filter arr1 to exclude all those records where arr1(i,1) is included in arr2 and arr1(i,2) <> "String2". Then paste the filtered array onto a worksheet

Community
  • 1
  • 1
gmeroni
  • 571
  • 4
  • 16
  • I would recommend replacing `arr2` with a [Dictionary object](http://stackoverflow.com/a/1309739/5090771). It works like a hash table, in that you use the value as the key. you can test if it exists without looping. otherwise, you might use [`Match` / `Index`](http://stackoverflow.com/a/7031744/5090771) but that sounds slow... – WhiteHat Dec 22 '15 at 13:22
  • thanks @WhiteHat, i'll definitelylook into it – gmeroni Dec 22 '15 at 13:41

1 Answers1

1

Add an extra field to arr1 arr1 is [n x 4]

  1. Make the extra field a boolean initialized as False

  2. Step through arr2, for each ID in arr2, set the corresponding entry in arr1 to True e.g. if we find ID23 in arr2, and we find ID23 in arr1 at arr1(x,1), set arr1(x,4) = True

  3. Copy arr1 to new array line at a time, only copying the items where arr1(x,4) = False

Also you may consider using a class instead of a multi-dimentional array. You can then build collections and use For Each to step through them. This can run more quickly and makes adding new fields easier.

user1582568
  • 288
  • 2
  • 7
  • 1
    I think that this approach force me to loop both for `n` first and then `m`. Am I right? If so It'll took too long since `n = 500'000` and `m = 500` – gmeroni Dec 22 '15 at 10:41
  • If the IDs in arr1 are in numerical order them implementing a binary search will significantly improve the speed. It may be more efficient to perform the checks earlier in your code as you build arr1 if this is possible. If the IDs in arr1 are in a random order but arr2 s in numerical order then is would be quicker to step through arr1 and use a binary search on arr2 for each item in arr1. If both are random then step through arr2 and build an ordered list of entries. Then step through arr1 and use the binary search on your ordered list. – user1582568 Dec 22 '15 at 10:55
  • both are random order. If i filter directly a range? Is it possible filter a range based on values inside a variant array? – gmeroni Dec 22 '15 at 11:08
  • I am not aware of a way, but if I were going to code it for random arrays, with arr2 being smaller and arr1 being very large, I would first build an ordered version of arr2, then step through arr1 and for each entry check if its ID is in the ordered version of arr2. This check can use a fast binary search. Each item which is not in arr2 can be added to the output list. – user1582568 Dec 22 '15 at 11:15