I really like functional programming concepts, but I've been bitten on two separate occasions now by the same gotcha, when mapping across a collection which happens to be a Set
(i.e. automatically removes duplicates). The issue is that after transforming the elements of such a set, the output container is also a set, and so removes any duplicates of the tranformed output.
A very brief REPL session to illustrate the issue:
scala> case class Person(name: String, age: Int)
defined class Person
scala> val students = Set(Person("Alice", 18), Person("Bob", 18), Person("Charles", 19))
students: scala.collection.immutable.Set[Person] = Set(Person(Alice,18), Person(Bob,18), Person(Charles,19))
scala> val totalAge = (students map (_.age)).sum
totalAge: Int = 37
I would of course expect the total age to be 18 + 18 + 19 = 55, but because the students were stored in a Set
, so were their ages after the mapping, hence one of the 18
s disappeared before the ages were summed.
In real code this is often more insidious and harder to spot, especially if you write utility code which simply takes a Traversable
and/or use the output of methods which are declared to return a Traversable
(the implementation of which happens to be a Set). It seems to me that these situations are almost impossible to spot reliably, until/unless they manifest as a bug.
So, are there any best practices which will reduce my exposure to this issue? Am I wrong to think about map
-ping over a general Traversable as conceptually transforming each element in place, as opposed to adding the transformed elements in turn into some new collection? Should I call .toStream
on everything before mapping, if I want to keep this mental model?
Any tips/recommendations would be greatly appreciated.
Update: Most of the answers so far have focused on the mechanics of including the duplicates in the sum. I'm more interested in the practices involved when writing code in the general case - have you drilled yourself to always call toList
on every collection before calling map
? Do you fastidiously check the concrete classes of all the collections in your app before calling methods on them? Etc.
Fixing up something that's already been identified as a problem is trivial - the hard part is preventing these errors from creeping in in the first place.