3

I have a function like this in Scala code (Scala 2.13) for use with Spark

def getDataset[T <: Product: TypeTag](name:String): Dataset[T] = {
    import spark.implicits._

    val ds = spark.read.parquet(BASE_PATH + "/" + name).as[T]
    ds.createOrReplaceTempView(name)
    ds
}

Now I want to turn a Seq of case classes, and for each class, call this function:

case class CLASS1(...)
case class CLASS2(...)
case class CLASS3(...)

Seq(CLASS1, CLASS2, CLASS3, ....).foreach {
  c => getDataset[c??](name=c???)
}

I'm having a hard time figuring out the exact syntax; the symbol for the name of the case class, represented by the variable c inside the foreach, seems to represent the type of the apply method (() => Product). What I really want is the type of the case class to use as the type parameter, and the name of the case class.

It feels like I should be able to do this - what am I missing here?

Update It looks like it's possible to get the name of the type used in a type parameter at runtime, via TypeTag.

The solution I am converging on is something like this:

def getDataset[T <: Product: TypeTag]: Dataset[T] = {
    import spark.implicits._

    val name = typeTag[T].tpe.typeSymbol.name.toString
    val ds = spark.read.parquet(BASE_PATH + "/" + name).as[T]
    ds.createOrReplaceTempView(name)
    ds
}

Then something like Seq(getDataset[CLASS1], getDataset[CLASS2], ...)

Not what I hoped for, but at least I can cut out the copy-paste of the class name and string.

wrschneider
  • 17,913
  • 16
  • 96
  • 176
  • How are you even constructing this `Seq` in Scala 2? All collections in Scala 2 are homogeneous. You can't have a sequence of elements where each element is an instance of a different class – sinanspd Oct 16 '21 at 19:02
  • 1
    Is this a `Seq[Class[_]]` maybe? – Gaël J Oct 16 '21 at 19:17
  • Ah ok, so they are all of the same type, it's that the seq might be a `Seq[A]` for any `A`. writing `class1, class2, class3` was a little bit confusing. Does `c.getClass` not suffice here for some reason ? – sinanspd Oct 16 '21 at 19:26
  • No, it's the case class names, but apparently the symbol for the case class name is being interpreted as the apply function rather than as the class, so it's a homogenous `Seq[() => Product]` – wrschneider Oct 16 '21 at 19:30
  • 2
    @wrschneider In `Seq(Class1, Class2, Class3)` `Class1`, `Class2`, `Class3` are the companion objects of case classes. – Dmytro Mitin Oct 16 '21 at 20:36

2 Answers2

2

You could define your own companion objects for the case classes and include a method in each which calls getDataset. For example, this should work (passed by my mental compiler):

abstract class DatasetProvider[T <: Product : TypeTag] {
  val name: String
  def dataset: Dataset[T] =
    getDataset[T](name)
}

case class Class1(...)

object Class1 extends DatasetProvider[Class1] {
  override val name: String = "class1"
}

// and so forth for Class2, Class3

Seq(Class1, Class2, Class3).foreach { c =>
  val ds = c.dataset
  ???
}

Note that if defining your own companion object, you will have to explicitly mark it as a function if you want to use it as one: this may or may not be desirable.

Dmytro Mitin
  • 48,194
  • 3
  • 28
  • 66
Levi Ramsey
  • 18,884
  • 1
  • 16
  • 30
  • 1
    I replaced `trait DatasetProvider[T]` with `abstract class DatasetProvider[T <: Product : TypeTag]` in order to satisfy `getDataset` bounds. – Dmytro Mitin Oct 16 '21 at 23:20
  • 1
    Any resemblance between my mental compiler and the actual compirler is coincidental :) – Levi Ramsey Oct 16 '21 at 23:22
1

The problem is that you want to substitute T (known at compile time) at type level and name (known at runtime) at value level.

Normally T and name do not exist at the same time.

One option is to replace Seq(Class1, Class2, Class3) on value level with Class1 :: Class2 :: Class3 :: HNil on type level and use Shapeless

import shapeless.{::, HNil, Poly0, Poly1, Typeable}
import shapeless.ops.hlist.FillWith
import scala.reflect.runtime.universe.{TypeTag, typeOf}

object datasetPoly extends Poly1 {
  implicit def cse[T <: Product : TypeTag /*: Typeable*/]: Case.Aux[T, Dataset[T]] = 
    at(_ => getDataset[T](/*Typeable[T].describe*/typeOf[T].toString))
}

object nullPoly extends Poly0 {
  implicit def cse[T >: Null]: Case0[T] = at(null)
}

FillWith[nullPoly.type, Class1 :: Class2 :: Class3 :: HNil].apply().map(datasetPoly)

Alternatively you can use macros or runtime reflection. In Seq(Class1, Class2, Class3) Class1, Class2, Class3 are the companion objects of case classes. For example with reflective toolbox

import scala.reflect.runtime.universe.Quasiquote
import scala.reflect.runtime.{currentMirror => cm}
import scala.tools.reflect.ToolBox

val tb = cm.mkToolBox()

Seq(Class1, Class2, Class3).foreach(c => {
  val classSymbol = cm.reflect(c).symbol.companion
  tb.eval(q"App.getDataset[$classSymbol](${classSymbol.name.toString})")
})

You should add to build.sbt

libraryDependencies += scalaOrganization.value % "scala-reflect" % scalaVersion.value
libraryDependencies += scalaOrganization.value % "scala-compiler" % scalaVersion.value
Dmytro Mitin
  • 48,194
  • 3
  • 28
  • 66
  • 1
    I'm going to accept this answer since I think the hard problem here is simply, that you can't specify type parameters at runtime without reflection and I need to find another option. – wrschneider Oct 18 '21 at 15:51