0

EDIT: Thanks to previous comments of @Bergi and @MattEllen, I have progressed in my thougths, so I repost this question.

Suppose that we have two sorted lists with duplicated values

// here arr1.length = arr2.length = 16
ar11 = 0 0 0 0 1 1 1 1 2 2 2 2 3 3 4 5 
arr2 = 0 0 1 1 1 1 2 2 2 2 3 4 5 6 7 8

and we want to compute their interesection (isect)

// duplicates preserved !
isect = 0 0 1 1 1 1 2 2 2 2 3 4 5

we can use a simple linear algorithm adapted from Python version described inside: Intersection of two lists including duplicates?

// basic intersection code
// allows lists to handle duplicates
const dumb_intersect = (arrays, intervals, params) => {
    const { arr1, arr2, results }  = arrays
    const { matches} = params
    const { i1, i2 } = intervals

    ii_iterate(arr1, i1, (value, rank) => {    
        const count = matches.get(value) || 0
        matches.set(value, count + 1)
    })

    ii_iterate(arr2, i2, (value, rank) => {
        const count = matches.get(value) || 0
        if(count > 0) {
           results.push(value)
           if (count > 1) {
                matches.set(value, count - 1)
           } else {
                matches.delete(value)
           }
       }
    })
}

In this example:

  • matches is a global map, and it could be reused for later calls of dumb_intersect
  • i1 = { min:0, max: 15 } and { min:0, max: 15 } (obvious!)
  • ii_iterate takes a list and an interval, and applies a callback to each element of the list when its index is bound to the interval

For the sake of simplicity I give you the code of ii_iterate:

// apply a callback function to each element of a slice of an array
const ii_iterate = (arr, ii, callback) => {
    return arr.slice(ii.min, ii.max + 1).map((val, idx) => {

        return callback(val, idx)
    })
}

TL:DR; ;-)

ALL THAT STUFF IS AWESOME, BUT I THINK I COULD DO BETTER AND FASTER !!!

Especially if we binary (dichotomic) serie of cuts of the intervals until a thresold:

// THRESOLD = 4 elements
cuts[0]: [0..15]                            # 1 interval * 16 elements
cuts[1]: [0..7] [8..15]                     # 2 * 8
cuts[2]: [0..3] [4..7] [8..11] [12..15]     # 4 * 4

... and apply dumb_intersect only of overlapping intervals

Normally, for huge lists thresold is computed easily:

thresold = 1 + Math.floor(Math.log((1 + arr1.length) * (1 + arr2.length)))

N = 1000    thresold = 15
N = 1000000 thresold = 28

But this is a minor aspect of the problem. Bianry split process is also easy to do. Arr1 and Arr2 of the initial example becomes, with a thresold of 4 :

ar11 = 0 0 0 0; 1 1 1 1; 2 2 2 2; 3 3 4 5
arr2 = 0 0 1 1; 1 1 2 2; 2 2 3 4; 5 6 7 8

(remember that that could be large lists)

This problem seems to be a "turtle rallye racing" between two lists of interval :

  • I need some king of cursors to iterate sperately the lists
  • then compare intervals for INF_STRICT, SUP_STR or MATCHING
  • apply dumb_intersect to MATCHING(s) intervals
  • until it ends

Since complexity of dumb_intersect is linear O(N) I 'm not sure today that it could be optimized in O(log(N)) or O(sqrt(N)) --> so I will just use the linear version for future coding !!

Thanks to all, best regards.

1 Answers1

0

Notice also that arr1 and arr2 are pre-sorted lists, with ascending order.

In that case, what is all the hassle with splitting the intervals about? We just need a merge algorithm to do this:

function intersection(arrays, intervals) {
    const { arr1, arr2, results } = arrays
    const { i1, i2 } = intervals

    let i = i1.min, j = i2.min;
    while (i <= i1.max && j <= i2.max) {
        if (arr1[i] < arr2[j]) {
            i++
        } else if (arr1[i] > arr2[j]) {
            j++
        } else { // arr1[i] == arr2[j]
            results.push(arr1[i])
            i++
            j++
        }   
    }
}
Bergi
  • 630,263
  • 148
  • 957
  • 1,375