3

I'm currently fetching a lot of objects that contains both names and coordinates of streets. The returned array has around 22.000 objects and the resulting array we want has around 4000, the rest are duplicates. The problem with this kind of data is that the fetched objects can have the same name but different coordinates, and i'm only interested by getting objects based on unique names. If there are more than one object with the same name, i'll only want to keep the first object.

So far i've been trying to loop through the streets by comparing the names. I'd rather use filter or some other more performance efficient solution.

My struct

struct StreetName {
    var name: String
    var polyLine: CLLocationCoordinate2D
}

My code so far

DataManager.shared.getStreetNames { (streets) in  
    var namesArray: [StreetName] = []
    for streetName in streets {
        let name = streetName.name
        if namesArray.count == 0 {
            namesArray.append(streetName)
        } else if namesArray.contains(where: {$0.name == name }) { 
             /* Dont add */ 
        } else {
             namesArray.append(streetName)
        }
    }

    self.streetNames = namesArray.sorted(by: {$0.name < $1.name})
    self.filteredStreetNames = self.streetNames
    OperationQueue.main.addOperation {
        self.streetTableView.reloadData()
    }
}

This code block works, but runs in around 30 seconds on an iPhone X. Which is way too slow. Any ideas?

Jacob Ahlberg
  • 2,352
  • 6
  • 22
  • 44

3 Answers3

2

I think if you profile this, you'll find that the sort is taking the most time. I can't find an official note, but there's a good chance that the underlying implementation is quick sort, which has it's worst complexity when the array is already sorted (or the array is sorted in reverse order).

The average case complexity for quick sort is O(n log n), but in the worst case it's O(n2).

I think you should implement insertion sort instead, or, more accurately, always insert the new elements into an already sorted position. This should reduce your complexity to O(n) for the entire function.

Pseudocode:

  • Fetch street names
  • For each street name
    • find the position in the existing array where the street name would go (I suggest binary search since the array is already sorted)
    • if the street name already exists, skip
    • if the name doesn't exist, insert it.

The result should be a sorted array of unique street names requiring each name to only be read and inserted once.

James Webster
  • 31,873
  • 11
  • 70
  • 114
  • If the array is already "somewhat" sorted which is not optimal for quick sort, how about `shuffle` then `quick sort` then `reduce`? – CouchDeveloper Apr 27 '18 at 07:59
  • I also like the idea of this, but have a hard time trying to figure out how to make this pseudo code into code. Any pointers? – Giovanni Palusa Apr 27 '18 at 08:11
  • In the suggested [duplicate](https://stackoverflow.com/questions/25738817/removing-duplicate-elements-from-an-array) the [answer of mxcl](https://stackoverflow.com/questions/25738817/removing-duplicate-elements-from-an-array/46354989#46354989) provides the most efficient way to remove duplicates. By comparison the time to sort the array is negligible. – vadian Apr 27 '18 at 08:18
  • Swift uses *introsort* (a variant of quick sort), compare https://stackoverflow.com/q/27677026/1187415. Whether an already sorted array is the worst case or not depends on the choice of the pivot element. [Apparently](https://github.com/apple/swift/blob/master/stdlib/public/core/Sort.swift.gyb#L191) Swift uses the middle value from the first, middle, last element. – Martin R Apr 27 '18 at 08:25
  • @vadian - Doesn't this compare equal objects? The objects in this case is unique, but some values might be same... – Giovanni Palusa Apr 27 '18 at 08:26
  • @vadian If the original array is sorted (or can be sorted and used as the input) and if the order of the result is not relevant, there seems to be a better algorithm than the proposed solution from mxcl. – CouchDeveloper Apr 27 '18 at 09:00
1

My take on this:

// Given an array of elements (here just Ints):
let array = (0..<1000).map { _ in Int(arc4random_uniform(100)) }

// Sort it:
let sorted = array.sorted()

// Define an empty result (array of elements) which is a variable 
// and which gets modified in the subsequent reduce function:
var unique: [Int] = []

// A tailored reduce which depends on a sorted array and appends 
// to the result IFF that element is not the last in result:
let result = sorted.reduce(into: unique) { (result, element) in
    if let last = result.last, last == element {
    } else {
        result.append(element)
    }
}

Finally, print the result:

print(array)

Example output on the console: console [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]

CouchDeveloper
  • 18,174
  • 3
  • 45
  • 67
1

@MartinR solved this by using Sets.

My new updated struct

struct StreetName: Hashable {
    static func == (lhs: StreetName, rhs: StreetName) -> Bool {
        return lhs.name == rhs.name
    }

    var hashValue: Int {
        return name.hashValue
    }

    var name: String
    var polyLine: CLLocationCoordinate2D
}

My new updated code

DataManager.shared.getStreetNames { (returnedNamesSet) in
    var namesArray: [StreetName] = Array(returnedNamesSet)

    self.streetNames = namesArray.sorted(by: {$0.name < $1.name})
    self.filteredStreetNames = self.streetNames
    OperationQueue.main.addOperation {
        self.streetTableView.reloadData()
    }
}


Results:

The process time went from 30 seconds to 0.4 seconds by using Set

Jacob Ahlberg
  • 2,352
  • 6
  • 22
  • 44