0

Learning Scala from the Scala for Data Science book and the companion Github repo, here I am particularly talking about this function, copied below for reference.

    def fromList[T: ClassTag](index: Int, converter: String => T): DenseVector[T] =
      DenseVector.tabulate(lines.size) { row => converter(splitLines(row)(index)) }

What does the DenseVector.tabulate(lines.size) mean between the = sign and the function body definition? New to scala (with background from python and C++), so cannot figure out if that DenseVector.tabulate(lines.size) is a local variable of the function being defined (when it should be declared inside the definition) or something else? It cannot be the return type, from what I understand of scala syntax.

Also, is the ClassTag equivalent to template in C++?

To help you answer the question,

  • splitLines has type scala.collection.immutable.Vector[Array[String]]
  • lines.size is an unsigned int (obvious, but still making it clear)
Dmytro Mitin
  • 48,194
  • 3
  • 28
  • 66
Della
  • 1,264
  • 2
  • 15
  • 32
  • 2
    `T: ClassTag` means runtime information of type T is passed along the function. For more info https://stackoverflow.com/q/40202504/757071. – Johny T Koshy Mar 29 '23 at 04:31
  • 2
    `DenseVector.tabulate` is a curried function that creates a DenseVector of a particular size and applies a function to each value. For more about currying: https://stackoverflow.com/q/62448617/757071 – Johny T Koshy Mar 29 '23 at 04:34
  • See this also for `ClassTag`: https://stackoverflow.com/q/34495711/757071 – Johny T Koshy Mar 29 '23 at 04:42

2 Answers2

4

DenseVector.tabulate is a factory function (defined on the companion object of DenseVector) that has two parameter lists with one parameter each (so altogether, it takes two explicit parameters: size: Int and a function f: Int => V).

You can find its definition here (as part of the breeze library).

In (pseudo-)C++ (ignoring the ClassTag), the corresponding declaration would probably look something like this:

template<classname V>
class DenseVector {
public:
    // ... other class members

    template<classname V>
    static DenseVector<V> tabulate(int size, std::function<V(int)> f);
};

and then fromList would probably look something like this:

template<classname T>
static DenseVector<T> fromList(int index, std::function<T(std::string)> converter) {
    return DenseVector::tabulate(lines.size, [&converter](int row){
        return converter(splitLines(row)[index]);
    });
}

MartinHH
  • 982
  • 4
  • 7
  • 1
    P.S.: the typesystem of scala is fundamentally different from C++, so the following is far from correct, but: if you want to focus on other aspects of your learning path for a while and postpone learning about implict context bounds until later, you can think of `[T: ClassTag]` as somewhat similar to `template requires ClassTag`. – MartinHH Mar 29 '23 at 05:33
  • Thanks a lot. Yes, how much of the language I need to learn before daring to jump in to start delivering (reading or writing) production code is always a dilemma, and seems scala got a lot of ground to cover. – Della Mar 29 '23 at 06:08
2

Your example uses several syntactic sugars.

It's the equivalent of the following which might be easier to read when starting:

def fromList[T](index: Int, converter: String => T)(implicit classtag: ClassTag[T]): DenseVector[T] = {
  def rowConverter(row: ???): ??? = {  
    converter(splitLines(row)(index))
  }
  DenseVector.tabulate(lines.size)(row => rowConverter(row))
}

Notice that:

  • the whole second line (in your original code) is the body of the method
  • tabulate is a method taking two sets of parameters
  • the second set of parameters of tabulate is a single parameter which is a "lambda" function
  • the ClassTag thing is called "context bound", it's a way to say the method needs an implicit value of the given type parameterized with the other type. Classtag itself is a way to preserve info on the type at runtime (which would be lost due to "type erasure in the JVM).
Gaël J
  • 11,274
  • 4
  • 17
  • 32