1

I have to carry out a challenge that involves the elaboration of an algorithm to compute the Pareto (set) boundary. The statement is basically:

Given a set S of n points in the square [0,1] x [0,1], make an algorithm to determine the subset P contained in S, formed by the non-dominated points of S.

It is also said that it is easy to elaborate an algorithm of the order n*n point comparisons that accomplish this purpose. Well I came up with an algorithm by researching here and there. The challenge is still to implement an algorithm of the order n*log(n). How do I get the order of these algorithms?

Thanks in advance!

#data
set.seed(103)
x = runif(200)
y = runif(200)

#algorithm
pareto = 1:length(x)
for(i in 1:length(x)){
    cond1 = y[i]!=min(y[which(x==x[i])])
    cond2 = x[i]!=min(x[which(y==y[i])])
    for(k in 1:length(x)){
        if((x[i]>x[k]  &  y[i]>y[k]) | (x[i]==x[k] & cond1) | (y[i]==y[k] & cond2)){
            pareto[i] = NA
            break
        }
    }
}
xPareto = x[pareto[!is.na(pareto)]]
yPareto = y[pareto[!is.na(pareto)]]

#graphic:
plot(x, y)
points(xPareto, yPareto, col=2, pch=16)
dat = data.frame(x=xPareto, y=yPareto)
dat = dat[order(dat$x),]
for(i in 1:(nrow(dat)-1)){
    segments(dat$x[i], dat$y[i], dat$x[i+1], dat$y[i+1], col=2, lty=2)
}

enter image description here

jassis
  • 416
  • 2
  • 12

1 Answers1

2

The intuition behind the efficient greedy solution to this problem lies in the fact that a point i is dominated by point j iff x[i] > x[j] and y[i] > y[j], which implies that j must come before i when the points are ordered by either coordinate. Hence, if we traverse the points in increasing order of their x-coordinates, then the point j (if any) that dominates point i must have been traversed before point i is traversed. In other words, it is impossible for the dominating point j to come after the dominated point i in this ordering.

Thus, with this traversal order the domination problem (i.e. checking if a point is dominated by some other point) boils down to checking if we have already seen a point with a lower y-coordinate as the traversal order already enforces the x-coordinate condition. This can easily be done by checking each point's y-coordinate to the lowest (minimum) y-coordinate we have seen so far -- if the minimum y-coordinate is less than the current point i's y-coordinate then the point j with the minimum y-coordinate dominates i as x[j] < x[i] because j was seen before i.

Sorting by the x-coordinate takes O(n log n) time and checking each point (while maintaining the partial minimum y-coordinate) takes O(n) time, making the entire algorithm take O(n log n) time.

o = order(x)
minY = Inf

pareto = 1:n
for(j in 1:n){
    i = o[j]
    if (y[i] <= minY) {
        # Part of pareto boundary
        minY = y[i]
    } else {
        # Dominated by some point in pareto boundary
        pareto[i] = NA
    }
}
EvilTak
  • 7,091
  • 27
  • 36
  • Thank you @EvilTak. But how do I demonstrate mathematically that these algorithms are O(n^2) and O(n log n)? Do you know any material that works with these accounts? I am not from the area, but I find it super interesting. – jassis Apr 19 '21 at 17:01
  • The proper term for this is [analyzing an algorithm](https://en.wikipedia.org/wiki/Analysis_of_algorithms). You can find many resources online that talk about the [time complexity](https://en.wikipedia.org/wiki/Time_complexity) analysis of algorithms. If you are looking for books, any good algorithms book will do -- a popular choice is [CLRS](https://en.wikipedia.org/wiki/Introduction_to_Algorithms). – EvilTak Apr 19 '21 at 22:22