0

I basically want to use the function ismember, but for a range. For example, I want to know what data points in array1 are within n distance to array2, for each element in array2.

I have the following:

array1 = [1,2,3,4,5]
array2 = [2,2,3,10,20,40,50]

I want to know what values in array2 are <= 2 away from array1:

indices(1,:) (where array1(1) = 1)     = [1 1 1 0 0 0 0]
indices(2,:) (where array1(2) = 2)     = [1 1 1 0 0 0 0]
indices(3,:) (where array1(3) = 3)     = [1 1 1 0 0 0 0]
indices(4,:) (where array1(4) = 4)     = [1 1 1 0 0 0 0]
indices(5,:) (where array1(5) = 5)     = [0 0 1 0 0 0 0]

Drawbacks:

My array1 is 496736 elements, my array2 is 9268 elements, so aI would rather not use a loop.

Adriaan
  • 17,741
  • 7
  • 42
  • 75
UpperEastSide
  • 117
  • 1
  • 16
  • I expect the output would be an array of size `[496736 x 9268]`. – rahnema1 Sep 16 '19 at 08:02
  • that is correct – UpperEastSide Sep 17 '19 at 00:08
  • You got some nice answers here, but the right approach of course is to build a binary tree to convert the search from O(n^2) to O(n log n). In multiple dimensions you’d use an R-tree to find nearby points. In 1D it’s a lot more trivial even. – Cris Luengo Sep 17 '19 at 01:18
  • I rolled back your answer summarising edit. Please reserve the question-space for only the *question* itself. If you would like to comment on individual answers, please comment on them. If you want to make a comparison between answers (be it timing, memory or syntax wise) you can add your own answer to do so if you want. – Adriaan Sep 17 '19 at 13:20

4 Answers4

6

Looping is a valid option here. Intialise an array output to the size of array1 X array2, then loop over all elements in array1 an subtract array2 from that, then check whether the absolute value is less than or equal to 2:

array1 = [1,2,3,4,5];
array2 = [2,2,3,10,20,40,50];

output = zeros(numel(array1), numel(array2),'logical');

for ii = 1:numel(array1)
   output(ii,:) = abs(array1(ii)-array2)<=2;
end
output =

     1     1     1     0     0     0     0
     1     1     1     0     0     0     0
     1     1     1     0     0     0     0
     1     1     1     0     0     0     0
     0     0     1     0     0     0     0

i.e. loops are not the problem.

Thanks to Rahnema1's suggestion, you can initialise output directly as a logical matrix:

output = zeros(numel(array1),numel(array2),'logical');

whose size is just 4.3GB.


On timings: Hans' code runs in a matter of seconds for array1 = 5*rand(496736,1); array2 = 25*rand(9286,1);, the looped solution takes about 15 times longer. Both solutions are equal to one another. obcahrdon's ismembertol solution is somewhere in between on my machine.


On RAM usage:

  • Both implicit expansion, as per Hans' answer, as well as the loop suggested in mine work with just 4.3GB RAM on your expanded problem size (496736*9286)
  • pdist2 as per Luis' answer and bsxfun as per Hans' on the other hand try to create an intermediate double matrix of 34GB (which doesn't even fit in my RAM, so I cannot compare timings).
  • obchardon's ismembertol solution outputs a different form of the solution, and takes ~5.04GB (highly dependent on the amount of matches found, the more, the larger this number will be).

In general this leads me to the conclusion that implicit expansion should be your option of choice, but if you have R2016a or earlier, ismembertol or a loop is the way to go.

Adriaan
  • 17,741
  • 7
  • 42
  • 75
4

Using implicit expansion, introduced in MATLAB R2016b, you can simply write:

abs(array1.' - array2) <= 2

ans =
  1  1  1  0  0  0  0
  1  1  1  0  0  0  0
  1  1  1  0  0  0  0
  1  1  1  0  0  0  0
  0  0  1  0  0  0  0

For earlier MATLAB versions, you can get this using the bsxfun function:

abs(bsxfun(@minus, array1.', array2)) <= 2

ans =
  1  1  1  0  0  0  0
  1  1  1  0  0  0  0
  1  1  1  0  0  0  0
  1  1  1  0  0  0  0
  0  0  1  0  0  0  0

Hope that helps!

P.S. On the "MATLAB is slow for loops" myth, please have a look at that blog post for example.


EDIT: Please read Adriaan's answer on the RAM consumption using this and/or his approach!

HansHirse
  • 18,010
  • 10
  • 38
  • 67
  • The `bsxfun` solution also borks on memory. Implicit expansion is quite the haul-over apparently! – Adriaan Sep 16 '19 at 09:27
2

If you have the Statistics Toolbox, you can use pdist2 to compute the distance matrix:

result = pdist2(array1(:), array2(:)) <= 2;

As noted by @Adriaan, this is not efficient for large input sizes. In that case a better approach is a loop with preallocation of the logical matrix output, as in his answer.

Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
2

You can also use ismembertol with some specific option:

A       = [1,2,3,4,5];
B       = [2,2,3,10,20,40,5000];
tol     = 2;
[~,ind] = ismembertol([A-tol;A+tol].',[B-tol;B+tol].',tol, 'ByRows', true, ...
    'OutputAllIndices', true, 'DataScale', [1,Inf])

It will create a 5x1 cell array containing the corresponding linear indice

ind{1} = [1,2,3]
ind{2} = [1,2,3]
...
ind{5} = [3]

In this case using linear indices instead of logical indices will greatly reduce the memory consumption.

obchardon
  • 10,614
  • 1
  • 17
  • 33
  • Thanks! I added to RAM comparison to my answer. As yours give a different form, its size is different. In this case it's 700MB *larger* than a pure logical matrix. I guess if you expect few matches, this solution would be more viable than the others. Is there a way to output the indices as integers, as that's what they are anyway? For the OP's full case `ind` will be a cell containing almost 500k double arrays. – Adriaan Sep 16 '19 at 13:22