Why does Set#map return an Array rather than another Set?

Question

Just ran into TypeError: no implicit conversion of Set into Array doing something of the form

thingGetterSet.map{|m| m.getThing }.select{|t| t.appropriate? } + appropriateThingsSet

It turns out Set#map returns an Array, where I expected a Set to map to another Set.

Even if that had returned a set, Set#select also returns an Array, where I also expected a Set.

What's going on here? Why is Ruby imposing a random ordering and allowing duplication on my things that are canonically non-ordered and where I want to prohibit duplication?

The class `Set` includes the module `Enumerable` to make `Enumerable` instance methods available for use by instances of `Set`. [Enumerable#map](https://ruby-doc.org/core-2.7.0/Enumerable.html#method-i-map) and [Enumerable#select](https://ruby-doc.org/core-2.7.0/Enumerable.html#method-i-select) return arrays regardless of the class of their receiver (e.g., arrays, hashes and so on). — Cary Swoveland, Mar 24 '20 at 01:17
Make your life easier now: switch to [snake case](https://github.com/github/rubocop-github/blob/master/STYLEGUIDE.md#naming). — anothermh, Mar 24 '20 at 01:20
Ruby can't possibly know that the result of `Set#map` is a set. For example `Set[1,2,3].map(&:odd?)` — max, Mar 24 '20 at 01:21
@max I'm not sure that's the reason, since `map!` is implemented for Set (`x = Set.new([1,2,3]); x.map!(&:odd?)`) — spike, Mar 24 '20 at 01:23
@spike `map!` mutates the receiver so there it makes sense that it actually returns a set. `.map` by definition iterates across an object and returns an array. — max, Mar 24 '20 at 01:26
`set.dup.map!(&:odd?) == Set[true, false]` is a pretty entertaining way to see if a set contains both even and odd numbers though. — max, Mar 24 '20 at 01:33
@spike, if you look though a list of `Enumerable` methods you will see that there are no `bang!` methods and none of them modify their receiver. Many classes that `include Enumerable` implement custom versions of some enumerable methods, sometime because it is desired to return a different class of object (e.g, [Hash#select](https://ruby-doc.org/core-2.7.0/Hash.html#method-i-select)) but more often to improve performance (e.g., [Array#map](https://ruby-doc.org/core-2.7.0/Array.html#method-i-map)). — Cary Swoveland, Mar 24 '20 at 02:11
@max: "Ruby can't possibly know that the result of `Set#map` is a set." – Of course, it can. For example, in Smalltalk and Scala, it knows just that. It's trivial to do in Ruby, as well: just override all `Enumerable` methods in all subtypes of `Enumerable`! Et voila: type-preserving collection methods. "For example `Set[1,2,3].map(&:odd?)`" – The result of that is obviously `Set[true, false]`, no problem there. — Jörg W Mittag, Mar 24 '20 at 05:12

score 4 · Answer 1 · answered Mar 24 '20 at 07:19

There are, in general, two different styles of collection operation libraries:

type-preserving: that is what you expect in your question
generic (not in the "parametric polymorphism sense" but the standard English sense of the word) or maybe "homogeneous"

Type-preserving collection operations try to preserve the type exactly for operations like select, take, drop, etc. that only take existing elements unmodified. For operations like map, it tries to find the closest super type that can still hold the result. E.g. mapping an IntSet to String can obviously not result in an IntSet, but only in a Set. Mapping an IntSet to Boolean could be represented in a BitSet, but I know of no collections framework that is clever enough to actually do that.

Generic / homogeneous collection operations always return the same type. Usually, this type is chosen to be very general, to accommodate the widest range of use cases. For example, In .NET, collection operations return IEnumerable, in Java, they return Streams, in C++, they return iterators.

Until very recently, it was only possible to implement type-preserving collection operations by duplicating all operations for all types. For example, the Smalltalk collections framework is type-preserving, and it does this by having every single collections class re-implement every single collections operation. This results in a lot of duplicated code and is a maintenance nightmare. (It is no coincidence that many new object-oriented abstractions that get invented have their first paper written about how it can be applied to the Smalltalk collections framework. See Traits: Composable Units of Behaviour for an example.)

To my knowledge, the Scala 2.8 re-design of the collections framework (see also this answer on SO) was the first time someone managed to create type-preserving collections operations while minimizing (though not eliminating) duplication. However, the Scala 2.8 collections framework was widely criticized as being overly complex, and it has required constant work over the last decade. In fact, it actually lead to a complete re-design of the Scala documentation system as well, just to be able to hide the very complex type signatures that the type-preserving operations require. But, this still wasn't enough, so the collections framework was completely thrown out and re-designed yet again in Scala 2.13. (And this re-design took several years.)

So, the answer to "why isn't the Ruby collections framework type-preserving" is quite simple, actually: because Ruby was created in 1993, and we (by that I mean the programming community at large) didn't figure out how to properly do it until 2019, 26 years later.

Also note that Scala's implementation relies heavily on static typing. Not only on static typing but on compile-time type-level programming, compile-time type-level introspection and compile-time type-level metaprogramming. These are not essential, but they do mean that you cannot simply copy their solution to Ruby. For example, Scala will use type classes and implicit search to figure out that the best possible match for

IntSet(1, 2, 3).map(_.toString)
//=> val res: Set[String] = Set("1", "2", "3")

is a Set[String] at compile-time. In Ruby, you could obviously still run the same search algorithm, although it would be much slower because you need to run it at runtime, over and over and over again for every single time you run map. It would be slower, but it would be possible: it's just an algorithm, if you can run it at compile time, then you can run it at runtime as well. BUT! The algorithm needs the return type of the block as one of its arguments! In Scala, this is inferred at compile time. How do you know that in Ruby?

But even in Scala, it is sometimes not possible to find a good match, e.g. here:

val m = Map(1 → "one", 2 → "two", 3 → "three")

m.map { case (k, v) ⇒ s"$k $v" }
//=> val res: Iterable[String] = List("1 one", "2 two", "3 three")

So, the best possible static type Scala could find is Iterable, which is actually the very top of the Scala collections hierarchy, and the best possible runtime type Scala could find is List, which is actually the "go-to" collection type in Scala, similar to Array in Ruby. In other words, this is actually Scala saying "I give up, man."

There is also another wrinkle, in that type-preserving collections operations challenge what we consider to be part of the contract of certain operations. For example, most people would argue that the cardinality of a collection should be invariant under map, in other words, map should map each element to exactly one new element and thus map should never change the size of the collection. But, what about this hypothetical code with a type-preserving Ruby collections framework:

Set[1, 2, 3].map(&:odd?)
#=> Set[true, false]

There are also some other interesting cases, where I don't even know what the return type should be in a type-preserving collections framework, for example, what about Ranges or IO streams:

(1..1000).map(&:odd?)
(1..1000).select(&:odd?)

File.open('bla').map(&:upcase)

Because of all of this, the designers of Ruby chose to have homogeneous collections operations in Ruby: every collection operation always returns an Array.

Well. Okay. Except sometimes they do choose to override them. E.g. in Hash, the filtering operations select, reject, etc. actually do return a Hash. But note that this is a recent change, and has in fact an interesting history:

In Ruby 1.8 (and possibly earlier), Hash#select returned an Array, but Hash#reject returned a Hash!
In Ruby 1.9, this was changed so that both return a Hash.
However, find_all, which is defined in Enumerable as an alias to select, is still to this day not overridden in Hash, and thus it returns an Array!
On the other hand, the newly-introduced filter, which is also defined as an alias to select in Enumerable, is overridden in Hash to return a Hash.

So, the Ruby designers chose simplicity (no code duplication, no complex type computation at runtime, all operations always return arrays so there are no surprises like in the example above where map changes the size of the set, etc.) over correctness, and made the Ruby collection operations homogeneous instead of type-preserving. But then they also chose pragmatism over purity and sprinkled some small number of type-preserving overrides here and there.

So, the fact that Set#map returns an Array should not be terribly surprising because that's what every single other class in the collections framework does as well. Changing this only for Set#map is not a good idea, IMO. If we do this, it should be done for all implementors of map. But that is a major breaking change, and thus will have to wait until Ruby 3 at the earliest. (Actually, matz has said that he wants to avoid breaking changes in Ruby 3.) But even changing only map for all implementors is strange, if we do this, it should be done for all operations. This is a major research task, and is thus much too late for Ruby 3, so this would have to wait until at least Ruby 4.

What we can argue about, however, is whether Array is the right choice for the universal collection type. You might notice that other similar frameworks choose a very general type: .NET has IEnumerable, Java has Stream, C++ has iterators. The equivalent in Ruby would be Enumerator. Maybe, Enumerator should be the type that is returned by all collections operations. For example, if you map over an infinite set, the result will again be infinite, but it will be an Array, which means it needs an infinite amount of memory!

This brings us back to pragmatism, though: in the majority of use cases, an Array is more useful than an Enumerator.

score 1 · Answer 2 · edited Mar 24 '20 at 04:03

As one of the comments said, the issue here is that Set includes Enumerable as the provider of a bunch of methods including map. This is a language design decision, so the best answer I can give is to point you to the developer's mailing list. This is an interesting thread from a few years ago: "Hash#select return type does not match Hash#find_all", particularly "[Ruby trunk Bug#13795] Hash#select return type does not match Hash#find_all".

I saw a few reasons in the thread for not re-implementing map and others in Set:

This is counter to the benefit of having a general purpose Enumerable, causing extra implementation work for classes that include that module as they'd have to re-implement a bunch of methods (though this wasn't the major reason).
There are uses of map where it's unclear that keeping the original data structure is a good idea. For example with :odd? in the comments, also there's an example in the mailing list thread of
```
{}.map {|k, v| "#{k}-#{v}"}
```
which wouldn't work returning a Hash.
There are other Standard Library classes, like Range and IO, where keeping their type post-map wouldn't make sense, so it's more consistent for the default behavior to return an array.

Why does Set#map return an Array rather than another Set?

2 Answers2