3

Consider a matrix of dimension Nx2, where each row contains the lower and upper bound of a uniform PDF (i.e., probability density function).

I want to count the number of overlaps, where an overlap is defined as the condition in which two PDFs are overlapping, e.g.:

  • PDF1: [2,5]
  • PDF2: [3,6]
  • The two PDFs are overlapping in the interval [3,5].

Obviously, if three PDFs p1, p2 and p3 are overlapping, I count three overlaps: p1 vs. p2, p1 vs. p3, p2 vs. p3.

I created the following MATLAB code that counts the overlaps:

for m = 1:N-1
    for k = m+1:N
        l1 = dataService.getObjectCoordinate(m,1);
        l2 = dataService.getObjectCoordinate(k,1);
        u1 = dataService.getObjectCoordinate(m,2);
        u2 = dataService.getObjectCoordinate(k,2);
        if (l1 <= l2 && l2 <= u1) || (l2 <= l1 && l1 <= u2)
            numOverlaps = numOverlaps + 1;
        end
    end
end 

However, as you can imagine, this goes as O(N^2), which is reeeeeally bad when N is big. I started the execution three hours ago with N=10000 and it is still running.

May you suggest a way of reducing the complexity of the proposed algorithm, maybe excluding some of the comparisons a priori?

Thanks in advance.

Eleanore
  • 1,750
  • 3
  • 16
  • 33
  • 1
    @MZimmerman6: The question I linked to has at least one answer that does it in O(N·logN). – Rody Oldenhuis Sep 04 '13 at 12:54
  • @RodyOldenhuis Yeah I just looked at that, and I am writing code to do it in less ... I think. We will see how this goes. So far what I have is running in under a second – MZimmerman6 Sep 04 '13 at 13:07

2 Answers2

3

I take back the comment I left earlier. You can definitely do this in a shorter amount of time. Based on the link provided by Rody and Shoelzer, you can use the following code to do this operation in under a second

tic
numIntervals = 10000;
ranges = sort(randi(100,[numIntervals,2]),2);
[vals,idx] = sort(ranges(:,1));
ranges = ranges(idx,:);
overlaps = false(numIntervals);
for i = 1:numIntervals
    temp = [ranges(:,1) <= ranges(i,2),ranges(:,1) >= ranges(i,1)];
    overlaps(:,i) = logical(all(temp,2));
end
overlaps = tril(overlaps,-1);
toc

ranges would be an array of your interval start and end points.

The purpose of the lower triangle portion at the end is to remove any duplicate pairs. If P1 overlaps P2 then obviously P2 will overlap P1. It will also remove the fact that P1 overlaps itself by removing the diagonal

Do be very careful running this with large numbers, as the amount of storage it uses will very quickly fill up your RAM, depending on the amount you have. I tried keeping everything as a logical array to help in this, but it still adds up fast.

You could definitely remove the storage portion and save yourself a ton of time, but then you have to handle everything immediately in each loop.

MZimmerman6
  • 8,445
  • 10
  • 40
  • 70
  • Correct me if I'm wrong, but I think this is just O(N²) -- in each iteration you do 3N comparisons (don't forget the `all()`), and you loop N times. The fact that it runs fast is not because the algorithm is fast, it's because 10000 is not that big a number. – Rody Oldenhuis Sep 05 '13 at 07:56
  • it probably is still O(n^2) it is just using the power of logical operators in MATLAB. – MZimmerman6 Sep 05 '13 at 12:01
  • I realize that, but remember that big-O refers to the *algorithm* complexity; it is implicitly assumed that the best implementation possible in the language is used. So using the power of MATLAB operators *and* having an O(N·logN) algorithm makes execution times like the one you found also possible for much larger N -- your O(N²) will diverge much faster as N grows. – Rody Oldenhuis Sep 05 '13 at 12:21
  • Of course, I understand that fact. I did not necessarily state anything about its complexity. But in terms of what the poster was asking, for 10000 intervals, this code will not take hours like the current implementation does. – MZimmerman6 Sep 05 '13 at 13:11
  • True. I just took an academic interest in this problem :) – Rody Oldenhuis Sep 05 '13 at 13:15
  • I completely understand. I tend to do the same but trying to implement the linked page's code quickly and easily was getting frustrating, so I figured ~10 lines in MATLAB was a good solution :) – MZimmerman6 Sep 05 '13 at 13:16
  • Well, you have my vote :) – Rody Oldenhuis Sep 05 '13 at 13:18
1

Have you profiled your code? A large part of the problem may be that you're calling dataService.getObjectCoordinate() four times per iteration. Instead, try getting all the data one time and store it in arrays before doing any comparisons.

After that, use the techniques described in answers to Possible Interview Question: How to Find All Overlapping Intervals.

Community
  • 1
  • 1
shoelzer
  • 10,648
  • 2
  • 28
  • 49