1

today I did a test for a job and was asked to search through an array of integers, this is the question:

The goal of this exercise is to check the presence of a number in an array.

Specifications:

The items are integers arranged in ascending order.

The array can contain up to 1 million items

Implement the function existsInArray(_ numbers: [Int], _ k: Int) so that it returns true if k belongs to numbers, otherwise the function should return false.

Example:

let numbers = [-9, 14, 37, 102]
existsInArray(numbers, 102) // returns true
existsInArray(numbers, 36) //returns false

Note: Try to save CPU cycles

Alright, so I gave my answer which is the code below and waited for the result

func existsInArray(_ numbers: [Int], _ k: Int) -> Bool {
    if numbers.isEmpty {
        return false
    }
    let numbersHalfIndex: Int = (numbers.count/2)
    if k == numbers[numbersHalfIndex] {
        return true
    } else if k != numbers[0] && numbers.count == 1 {
        return false
    } else if k <= numbers[numbersHalfIndex] {
        let leftHalfNumbersArray = numbers[0 ..< numbersHalfIndex]
        return existsInArray(Array(leftHalfNumbersArray), k)
    } else if k > numbers[numbersHalfIndex] {
        let rightHalfNumbersArray = numbers[numbersHalfIndex ..< numbers.count]
        return existsInArray(Array(rightHalfNumbersArray), k)
    } else {
        return false
    }
}

So turns out that "The solution doesn't work in a reasonable time with one million items" and now I don't know what I did wrong since binary search is fast as f*ck.

My only guess is that maybe number.count or numbers[0 ...< numbersHalfIndex] or numbers[numbersHalfIndex ...< number.count] makes everything go slower than expected.

Am I tripping or something?

Edit: If anyone is curious I tested my code and Martin R code to see how much of an impact using ArraySlice have in terms of time. I used an array of 100.000.000 itens in ascending order starting from 0. Here is how I captured the time:

print("////////// MINE //////////")
var startTime = CFAbsoluteTimeGetCurrent()
print(existsInArray(numbers, 0))
var timeElapsed = CFAbsoluteTimeGetCurrent() - startTime
print("Time elapsed for mine: \(timeElapsed) s.")

print("////////// Martin R //////////")
counter = 0
startTime = CFAbsoluteTimeGetCurrent()
print(existsInArrayOptimal(numbers, 0))
timeElapsed = CFAbsoluteTimeGetCurrent() - startTime
print("Time elapsed for Martin R: \(timeElapsed) s.")

And here is the result:

////////// MINE //////////

true

Time elapsed for mine:

1.2008800506591797 s.

////////// Martin R //////////

true

Time elapsed for Martin R: 0.00012993812561035156 s.

It's about 1000x faster!

Martin R
  • 529,903
  • 94
  • 1,240
  • 1,382
Douglas Pfeifer
  • 147
  • 1
  • 8
  • Are you testing it on a playground? Have you tried using endIndex instead of count? – Leo Dabus Feb 07 '20 at 21:33
  • 3
    Btw you are initializing a new array with its slice – Leo Dabus Feb 07 '20 at 21:37
  • 2
    Converting the slices back into arrays with `Array(...)` is the slowest thing you're doing in there. It would be much better if you did all of the work on the slices – dan Feb 07 '20 at 21:41

1 Answers1

6

Accessing number.count is not a problem because that is a O(1) operation for arrays. And slicing with numbers[0 ...< numbersHalfIndex] is not a problem either. But Array(leftHalfNumbersArray) creates a new array from the slice, and that copies all the elements.

There are two possible ways to avoid that:

  • Update array indices (for lower and upper bound of the current search range) instead of creating arrays which are passed down the recursion.
  • Pass array slices down the recursion. Slices share the elements with the original array (as long as they are not mutated).

A demonstration of the second approach:

func existsInArray(_ numbers: ArraySlice<Int>, _ k: Int) -> Bool {
    if numbers.isEmpty {
        return false
    }
    let numbersHalfIndex = numbers.startIndex + numbers.count / 2
    if k == numbers[numbersHalfIndex] {
        return true
    } else if k < numbers[numbersHalfIndex] {
        return existsInArray(numbers[..<numbersHalfIndex], k)
    } else {
        return existsInArray(numbers[(numbersHalfIndex + 1)...], k)
    }
}

Note that array slices share their indices with the original array so that the indices do not necessarily start at zero. That's why numbers.startIndex is used for the index calculation.

And a wrapper function which takes a “real” array argument:

func existsInArray(_ numbers: [Int], _ k: Int) -> Bool {
    return existsInArray(numbers[...], k)
}

As @Leo suggested, you can implement this as a collection method instead of implementing two separate methods. Collection indices are not necessarily integers, but for a RandomAccessCollection the index calculations are guaranteed to be O(1). You can also generalize it to collections of arbitrary comparable elements instead of integers.

Here is a possible implementation:

extension RandomAccessCollection where Element: Comparable {
    /// Returns a Boolean value indicating whether the collection contains the
    /// given element. It is assumed that the collection elements are sorted
    /// in ascending (non-decreasing) order. 
    ///
    /// - Parameter element: The element to find in the collection.
    /// - Returns: `true` if the element was found in the collection; otherwise,
    ///   `false`.
    ///
    /// - Complexity: O(log(*n*)), where *n* is the size of the collection.
    func binarySearch(for element: Element) -> Bool {
        if isEmpty {
            return false
        }
        let midIndex = index(startIndex, offsetBy: count / 2)
        if element == self[midIndex] {
            return true
        } else if element < self[midIndex] {
            return self[..<midIndex].binarySearch(for: element)
        } else {
            return self[index(after: midIndex)...].binarySearch(for: element)
        }
    }
}

Usage:

let numbers = [-9, 14, 37, 102]
print(numbers.binarySearch(for: 102)) // true
print(numbers.binarySearch(for: 36))  // false

Alternatively a non-recursive method which updates the indices of the search range:

extension RandomAccessCollection where Element: Comparable {
    func binarySearch(for element: Element) -> Bool {
        var lo = startIndex
        var hi = endIndex

        while lo < hi {
            let mid = index(lo, offsetBy: distance(from: lo, to: hi) / 2)
            if element == self[mid] {
                return true
            } else if element < self[mid] {
                hi = mid
            } else {
                lo = index(after: mid)
            }
        }
        return false
    }
}
Martin R
  • 529,903
  • 94
  • 1,240
  • 1,382
  • Yep, that's it. Turns out creating a new array is slow enough to trigger a bad answer. – Douglas Pfeifer Feb 07 '20 at 21:53
  • Oh I almost forgot, how should I pass the indexes if I want to keep the function as func existsInArray(_ numbers: [Int], _ k: Int) -> Bool ? – Douglas Pfeifer Feb 07 '20 at 22:28
  • 1
    @DouglasPfeifer: Then you would define a `func existsInArray(_ numbers: [Int], lowIndex: Int, highIndex: Int _ k: Int) -> Bool` function. Or without recursion, similarly as in https://stackoverflow.com/a/26679191/1187415. – Martin R Feb 07 '20 at 22:34
  • @MartinR instead of creating two methods you could create a generic one that would accept any collection. Something like `func exists(in elements: C, _ k: I) -> Bool where C.Element: FixedWidthInteger, C.Index == Int { guard !elements.isEmpty else { return false } let halfIndex = elements.startIndex + elements.count / 2 return k == elements[halfIndex] ? true : k < elements[halfIndex] ? exists(in: elements[.. – Leo Dabus Feb 08 '20 at 01:57
  • or extending FixedWidthInteger `extension FixedWidthInteger { func exists(in elements: C) -> Bool where C.Element == Self, C.Index == Int { if elements.isEmpty { return false } let halfIndex = elements.startIndex + elements.count / 2 return self == elements[halfIndex] ? true : self < elements[halfIndex] ? exists(in: elements[.. – Leo Dabus Feb 08 '20 at 02:00
  • 2
    @LeoDabus: Thanks for the suggestions, you are of course right. (My intention was mainly to explain the problem, not to provide the most general binary search method.) – Martin R Feb 08 '20 at 04:17
  • @MartinR Instead of constraining `Index == Int` we can do `let midIndex = index(startIndex, offsetBy: count / 2)` and `index(after: midIndex)` Is there a `RandomAccessCollection` that's not `Int` indexed? – Leo Dabus Feb 08 '20 at 04:40
  • 2
    @LeoDabus: That's exactly what I was just thinking about :) – Martin R Feb 08 '20 at 04:47
  • 1
    Isn't *slicing* still a bit more efficient in the non-recursive version? `func binarySearch(for element: Element) -> Bool { var slice : SubSequence = self[...] while !slice.isEmpty { let mid = slice.index(slice.startIndex, offsetBy: slice.count / 2) if element == slice[mid] { return true } else if element < slice[mid] { slice = slice[index(after: mid)...] } else { slice = slice[.. – vadian Feb 08 '20 at 08:23
  • @vadian: A quick benchmark showed no difference (but note that there is an error in your code, the two else blocks must be exchanged). However, the non-recursive version is significantly faster than the recursive version. – Martin R Feb 08 '20 at 09:28
  • `Slice` has been promoted so enthusiasticly on WWDC that I assumed it provides benefit in all situations where a subsequence is needed. You're right about the error. This comes from quickly c&p things together. – vadian Feb 08 '20 at 09:37