3

I discovered Set few days ago and I'm know using it every times I need a sequence without duplicates, even if I'm sure there won't be any. Because of this, I often need to use the toSet method in my code.

Thus, I now wondering if it is a good practice to use Set instead of Seq every times I need a sequence without duplicates?

Simon
  • 6,025
  • 7
  • 46
  • 98

2 Answers2

5

There is an additional important property of sets, they have no defined order. If your collection fits this then using Set is a good idea. (So, if you reach for toSet all over, then it probably is a good idea tm)

If the order is defined by some property of the values in the Set then there is SortedSet that you can use.

If it on the other hand is a sequence of values with a defined order that is not based on some property of the values but you want unique values, then Set is not a good fit. You can use Seq.distinct to still have a Seq but without duplicates.

johanandren
  • 11,249
  • 1
  • 25
  • 30
  • Can you please explain a bit more "that is not based on some property of the values"? – Simon Mar 03 '15 at 21:29
  • When the order is for example the order they were added to the collection as opposed to for example the username of the users in a list which would be a property of the values. – johanandren Mar 03 '15 at 21:30
5

If you are sure that you won't have duplicates, then you should use a Seq like Vector. The reason is that Set has extra overhead: it has to hash every element and possibly check equality against some other elements. Depending on how many elements you have and how complex they are, this may be something you want to avoid.

A demonstration:

class A(val name: Int) {
  override def hashCode() = {
    println(f"hashing $name")
    name.hashCode
  }
  override def equals(other: Any) = other match {
    case a: A =>
      println(f"$name =?= ${a.name}")
      name == a.name
    case _ => false
  }
}


val elements = (0 to 10).map(new A(_))

println("TO VECTOR")
val seq = Vector.empty ++ elements

println("TO SET")
val set = Set.empty ++ elements

prints:

TO VECTOR    // Notice no extra work was done
TO SET       // Lots of extra stuff done:
1 =?= 0
2 =?= 0
2 =?= 1
3 =?= 0
3 =?= 1
3 =?= 2
4 =?= 0
4 =?= 1
4 =?= 2
4 =?= 3
hashing 0
hashing 1
hashing 2
hashing 3
hashing 4
hashing 5
hashing 6
hashing 7
hashing 8
hashing 9
hashing 10
dhg
  • 52,383
  • 8
  • 123
  • 144
  • Ok thanks! Can you please tell me if I should choose Vector instead of Seq even if I don't need to process operations with the sequence (that could be faster with an indexed sequence)? – Simon Mar 03 '15 at 22:16
  • `Seq` is not a class; it is a trait. A `Vector` is a kind of `Seq`. So it's not "instead of", it is using a `Seq`. Maybe you're asking if `Vector` is better than a `List`? It has [been argued](http://stackoverflow.com/questions/6928327/when-should-i-choose-vector-in-scala) that a `Vector` is faster than a `List` for most purposes, making it a good general-purpose `Seq` implementation. – dhg Mar 03 '15 at 22:24
  • Thus when I say `val seq = Seq()` it's the `Vector class` that is chosen? – Simon Mar 03 '15 at 22:27
  • `Seq()` creates a `List`. You can see this if you do `println(Seq())`. – dhg Mar 03 '15 at 22:28
  • So I should write `val seq = Vector()` (or `Vector.empty`)? (Sorry for my dumb questions, I'm new to `Scala`) – Simon Mar 03 '15 at 22:30
  • 1
    They ultimately do the same things, but people tend to prefer `Vector.empty`. It's also maybe like a millisecond faster since it just [returns a static object](https://github.com/scala/scala/blob/v2.11.6/src/library/scala/collection/immutable/Vector.scala#L26) instead of [creating a builder](https://github.com/scala/scala/blob/v2.11.6/src/library/scala/collection/generic/GenericCompanion.scala#L45) and calling a method on it. – dhg Mar 03 '15 at 22:32