0

Say I have an array of strings:

let arrayOfStrings = ["a", "b", "a", "c", "a", "d"]

How would I get rid of the duplicates?

Duncan C
  • 128,072
  • 22
  • 173
  • 272

1 Answers1

17

You can use the array function contains(_:) to check if an element is already part of the array, but that is fairly slow, and for large arrays it won’t perform well. (1.) Better to copy the entries into a Set and use Set operations to find and remove the duplicates. Sets are optimized to make testing for set membership fast, so if aSet.contains(item) is a lot faster than if anArray.contains(item).

If you don't care about preserving the order of your items, you can simply copy your array into a set and then back to an array. However, that does mean that the items in the resulting array will be in a different order.

A function to remove duplicates from an array of strings, while preserving the order, might look like this:

func uniqueElementsFrom(array: [String]) -> [String] {
  //Create an empty Set to track unique items
  var set = Set<String>()
  let result = array.filter {
    guard !set.contains($0) else {
      //If the set already contains this object, return false
      //so we skip it
      return false
    }
    //Add this item to the set since it will now be in the array
    set.insert($0)
    //Return true so that filtered array will contain this item.
    return true
  }
  return result
}

If you call it with code like this:

let arrayOfStrings = ["a", "b", "a", "c", "a", "d"]
let uniqueStrings = uniqueElementsFrom(array:arrayOfStrings)
print("Unique elements from \(arrayOfStrings) = \n” + 
  “\(uniqueStrings)")

The output would be

Unique elements from ["a", "b", "a", "c", "a", "d"] =

[“a”, "b", "c", "d"]

However, that function only works with arrays of strings. It would be good if we could write a function that could remove duplicates from any kind of array.

This is a job for Generics. There is a catch however. Sets can only contain objects that conform to the Hashable protocol, since Sets use hashes to make testing for set membership faster.

We can rewrite the uniqueElementsFrom(array:) function to take any array that conforms to the Hashable protocol using Generics. That code looks like this:

func uniqueElementsFrom<T: Hashable>(array: [T]) -> [T] {
  var set = Set<T>()
  let result = array.filter {
    guard !set.contains($0) else {
      return false
    }
    set.insert($0)
    return true
  }
  return result
}

The <T: Hashable> bit after the function name says "The rest of this function will refer to a type T which is unspecified. The only thing you can be sure of is that the type T will conform to the Hashable protocol."

This form of the uniqueElementsFrom(array:) function will work on any array who’s elements are Hashable.


(1.) For arrays, contains(_:) has O(n) performance, and so looping through an array, testing the array to see if it contains each new element with contains(_:) has performance that's almost O(n^2), which is really, really bad for anything but a small array. I'm pretty sure that Set's contains(_:) function has constant time performance, so the whole process would have O(n) performance.

Community
  • 1
  • 1
Duncan C
  • 128,072
  • 22
  • 173
  • 272
  • Would this help if I have an array of objects? And I need to filter same latitude-longitudes contained inside the object? – Lohith Korupolu Aug 09 '17 at 09:35
  • 1
    @LohithKorupolu, yes, you could use the same approach. CLLocation objects are Equable, so you could make your array of objects contain CLLocations, and compare them using `==`. Note however that the locations would have to be EXACTLY the same. If you are trying to tell if multiple GPS readings represent the same location, using `==` to compare locations won't work because the locations will be different by a tiny amount. – Duncan C Sep 13 '17 at 22:32
  • @DuncanC, Thank you very much. this solution is best to remove duplicate string from array. – AzeTech Jan 18 '20 at 05:44
  • Actually I think Jessy's answer in this thread: https://stackoverflow.com/questions/25738817/removing-duplicate-elements-from-an-array-in-swift is more elegant and flexible. He suggests 2 different extensions to `Sequence`, one for a sequence that's `hashable`, and one for a sequence that is `equatable`. The `hashable` version will be a LOT faster for long sequences, but at least you have an option for non-hashables. – Duncan C Jan 18 '20 at 14:58
  • @LohithKorupolu I suggest creating custom implementation of == for CLLocation objects that uses a "slop" or "epsilon" value that treats locations that are very close as the same. You could define a global static var that your extension uses. A naive way to calculate "close enough" is to use Pythagorean distance between the locations. That doesn't allow for the fact that lines of longitude get much closer as you approach the poles however. – Duncan C May 02 '20 at 23:45
  • The rigorous way would be to calculate the "great circle distance" between the points using the Haversine formula, but that is computationally expensive. If you're not near the poles then Pythagorean distance should be fine, and much faster than calculating distances using the Haversine formula. (https://en.wikipedia.org/wiki/Haversine_formula) – Duncan C May 02 '20 at 23:47