2

Hope you are having a great day.

I'm trying to understand what is the fastest approach to do the following:

Assuming I have these two Arrays:

var firstArray = ["a","b","c"]
var secondArray = ["a","d","e"]

I'd like to get as an output:

1)Array of objects that inside firstArray but no in secondArray .
1)Array of objects that inside secondArray but no in firstArray .
3)Array of the common objects between firstArray and secondArray.

So basically the output would be:

1) ["b","c"]
2) ["d","e"]
3) ["a"]

The main issue here is to understand what is the most efficient way to do so. Thank you very much!

Roi Mulia
  • 5,626
  • 11
  • 54
  • 105
  • Are the arrays sorted? Do they contain strings, or is this just an example? If they're not strings, can the objects be compared with `<` ? – vacawama Jul 09 '17 at 09:54
  • 1
    Convert the arrays to sets and then use `subtracting` and `intersection` – Paulw11 Jul 09 '17 at 09:54
  • If the order of the elements is not important for you, and if (as in your example), all elements are unique, you might want to consider using set algebra rather than the an ordered collection type such as `Array`. E.g. using `Set` in Swift allows you to use `subtract(_:)` for 1) and 2), and `intersection(_:)` for 3), which all use O(1) (amortized) lookup for comparing elements between the sets (as compared to e.g. using O(n) `contains(_:)` of `Array` (with `Equatable` elements) to lookup existance of some specified element). See [`Set`](https://developer.apple.com/documentation/swift/set). – dfrib Jul 09 '17 at 09:54
  • 1
    If the items are unique https://stackoverflow.com/questions/24589181/set-operations-union-intersection-on-swift-array – vadian Jul 09 '17 at 09:55

2 Answers2

5

If your arrays are sorted and the items are unique in each array, the fastest way would be to handle each of the items only once. Start by comparing the first items in each array; if they are equal, put that into the common array and then move on to the second items. If one item is less than another, it goes into the unique array of the lesser item and you move on to the next item in the lesser array. Continue this process until you run out of items for one array, then put the remaining items of the second array into the unique items array for that array.

var i = 0
var j = 0

let a = ["a", "b", "c"]
let b = ["a", "d", "e"]

var aUnique = [String]()
var bUnique = [String]()
var common = [String]()

while i < a.count && j < b.count {
    if a[i] == b[j] {
        common.append(a[i])
        i += 1
        j += 1
    } else if a[i] < b[j] {
        aUnique.append(a[i])
        i += 1
    } else {
        bUnique.append(b[j])
        j += 1
    }
}

if i < a.count {
    // put remaining items into aUnique
    aUnique += a[i ..< a.count]
} else if j < b.count {
    // put remaining items into bUnique
    bUnique += b[j ..< b.count]
}

print(common)  // ["a"]
print(aUnique) // ["b", "c"]
print(bUnique) // ["d", "e"]

Analysis

  • This algorithm appends one item to one of the arrays each time through the loop. It will loop at most a.count + b.count - 1 times if both arrays are unique relative to each other, or only their last item is common.
  • If both arrays are identical, it will loop only a.count times.
  • If all elements of array b are greater than the elements of array a, it will loop only a.count times. If all elements of array a are greater than the elements of array b, it will loop only b.count times.
vacawama
  • 150,663
  • 30
  • 266
  • 294
  • Hey! Thank you for your detailed answer! I’ve made a Benchmark (my first one) for both of your answers. In this specific scenario, the scenario that I will eventually face, we can assume that for each array values are unique + we don't need specific order. I’m adding a link for the xCode benchmark project https://www.dropbox.com/s/5t5ktu2soea56wu/Benchmark.zip?dl=0 . I’ve tested it on a real device (6+) and it seems that @vacawama method is faster. But for the sake of the knowledge I want to make sure.Can you take a quick look and update if the benchmark preformed correctly? Thank you again! – Roi Mulia Jul 09 '17 at 12:17
  • I've updated the method. I've tested on 6+ with 100k elements on Release. Again thank you for the detailed answer! Much appreciated. Always learning new stuff :) that's the results : Time elapsed for vacawamaMethod: 0.000427007675170898 s. ========= ========= ========= ========= ========= Time elapsed for dfriMethod: 1.23837202787399 s. @vacawama method is much faster. – Roi Mulia Jul 09 '17 at 12:54
3

I will assume that the elements of your arrays are Equatable.

If they are also Hashable, and if the order of the elements is not important for you, and if (as in your example), all elements are unique, you might want to consider using set algebra rather than the an ordered collection type such as Array. E.g. using Set in Swift allows you to use the subtract(_:) or subtracting(_:) mutating/non-methods for 1) and 2), and intersection(_:)/formIntersection(_:) for 3), which all use O(1) (amortized) lookup for comparing elements between the sets (as compared to e.g. using O(n) contains(_:) of Array (with Equatable elements) to lookup existance of some specified element).

For additional details, see the language reference for Set as well as the thread linked to by vadian:


If the elements in each array is not unique, and you want to keep multiples as well as the order between the elements, you could use the Set representation of one of the arrays while filtering the other one.

E.g., for:

var firstArray = ["a","b","c"]
var secondArray = ["a","d","e"]

A) in O(n):

let excludeElements = Set(secondArray)        // O(n)
secondArray = secondArray
    .filter { !excludeElements.contains($0) } // O(n) due to O(1) (amortized) .contains lookup

B) in O(n):

let excludeElements = Set(firstArray)         // O(n)
secondArray = secondArray
    .filter { !excludeElements.contains($0) } // O(n) due to O(1) (amortized) .contains lookup

C) in O(n), using order and duplicates as they occur in firstArray:

let includeElements = Set(secondArray)  // O(n)
let commonElements = firstArray
    .filter(includeElements.contains)   // O(n) due to O(1) (amortized) .contains lookup

C) in O(n), using order and duplicates as they occur in secondArray:

let includeElements = Set(firstArray) // O(n)
let commonElements = secondArray
    .filter(includeElements.contains) // O(n) due to O(1) (amortized) .contains lookup

Performance?

The above only looks at asymptotic time complexity, and doesn't take into account any actual benchmarking. Generally the functional methods such as filter are slower than just a for or while loop, so if performance becomes an issue for your app, you should consider, at that point, performing profiling as well as custom bench-marking possible bottlenecks in your algorithms.

Also, if your arrays are known to be sorted, there are more efficient ways to traverse them and filter out the result. See e.g. the following thread (C language, but the logic is the important part):

dfrib
  • 70,367
  • 12
  • 127
  • 192
  • I didn't look at your answer or your link until I had answered. It's kind of spooky how I chose names "a", "b", "i", "j", and "common" like your linked answer, but it is just a coincidence (and the fact that those are "obvious" names for naming impaired programmers). – vacawama Jul 09 '17 at 11:02
  • @vacawama Great minds think alike, I guess :) I also think it's good with an explicit Swift version, and your answer also includes the unique cases queried by the OP. – dfrib Jul 09 '17 at 11:07
  • Hey! Thank you for your detailed answer! I’ve made a Benchmark (my first one) for both of your answers. In this specific scenario, the scenario that I will eventually face, we can assume that for each array values are unique + we don't need specific order. I’m adding a link for the xCode benchmark project https://www.dropbox.com/s/5t5ktu2soea56wu/Benchmark.zip?dl=0 . I’ve tested it on a real device (6+) and it seems that @vacawama method is faster. But for the sake of the knowledge I want to make sure.Can you take a quick look and update if the benchmark preformed correctly? Thank you again! – Roi Mulia Jul 09 '17 at 12:17
  • @RoiMulia, I had a typo in my algorithm and I noticed you still have the incorrect version. The index for `b` is always `j` so the correct line is `} else if a[i] < b[j] {`. You should test performance in Release mode. – vacawama Jul 09 '17 at 12:31
  • I've updated the method. I've tested on 6+ with 100k elements on Release. Again thank you for the detailed answer! Much appreciated. Always learning new stuff :) that's the results : Time elapsed for vacawamaMethod: 0.000427007675170898 s. ========= ========= ========= ========= ========= Time elapsed for dfriMethod: 1.23837202787399 s. @vacawama method is much faster – Roi Mulia Jul 09 '17 at 12:55
  • @RoiMulia, my method depends on the arrays being sorted. I don't believe your test is producing sorted a and b. If you sort a and b before calling my method, it flies. If you include the time to sort a and b into the test, the result is much closer (or even favors dfri's use of Set). – vacawama Jul 09 '17 at 13:00
  • @vacawama and if i don't need them sorted at all? I mean, when you say sorted, you mean alphabetic? – Roi Mulia Jul 09 '17 at 14:56
  • @RoiMulia as vacawama writes (and as I mention briefly in the bottom of my answer): if you can make the assumption that your array is _already sorted_ (e.g. by construction), then this strong property can be used to implement a very quick method for "filtering" into your different categories, as has been shown in detailed in vacawama:s answer. However, if you cannot assume your arrays to be sorted (and this is only by coincidence in your example), you need to sort them _prior_ to applying the subsequent algorithm. ... – dfrib Jul 09 '17 at 15:15
  • ... I believe Swift use an introsort hybrid, which runs in `O(n log(n))` asymptotically (average & worst case), so in case of non-sorted arrays, the `Set` approach(es) above should be favourably, asymptotically (`O(n)` as compared to `O(n log(n))`). – dfrib Jul 09 '17 at 15:15
  • @RoiMulia, sorted by how `<` applies to `String`. So `a.sort()` and `b.sort()` would be be needed before applying the algorithm I presented. Your examples were sorted, so I presented the fastest way to create your arrays given sorted input with the caveat "If your arrays are sorted ...". – vacawama Jul 09 '17 at 16:29
  • @vacawama Hey! I'll Benchmark again when i'll be back at the office and update here. I've just understand what both of you meant. I'll benchmark it again. Thank you for clarifying. – Roi Mulia Jul 09 '17 at 17:48