0

I'm a scala newbie, using pyspark extensively (on DataBricks, FWIW). I'm finding that Protobuf deserialization is too slow for me in python, so I'm porting my deserialization udf to scala.

I've compiled my .proto files to scala and then a JAR using scalapb as described here

When I try to use these instructions to create a UDF like this:

import gnmi.gnmi._
import org.apache.spark.sql.{Dataset, DataFrame, functions => F}
import spark.implicits.StringToColumn
import scalapb.spark.ProtoSQL

// import scalapb.spark.ProtoSQL.implicits._
import scalapb.spark.Implicits._

val deserialize_proto_udf = ProtoSQL.udf { bytes: Array[Byte] => SubscribeResponse.parseFrom(bytes) }

I get the following error:

command-4409173194576223:9: error: could not find implicit value for evidence parameter of type frameless.TypedEncoder[Array[Byte]]
val deserialize_proto_udf = ProtoSQL.udf { bytes: Array[Byte] => SubscribeResponse.parseFrom(bytes) }

I've double checked that I'm importing the correct implicits, to no avail. I'm pretty fuzzy on implicits, evidence parameters and scala in general.

I would really appreciate it if someone would point me in the right direction. I don't even know how to start diagnosing!!!

Update

It seems like frameless doesn't include an implicit encoder for Array[Byte]??? This works:

frameless.TypedEncoder[Byte]

this does not:

frameless.TypedEncoder[Array[Byte]]

The code for frameless.TypedEncoder seems to include a generic Array encoder, but I'm not sure I'm reading it correctly.

@Dymtro, Thanks for the suggestion. That helped.

Does anyone have ideas about what is going on here?

Update

Ok, progress - this looks like a DataBricks issue. I think that the notebook does something like the following on startup:

import spark.implicits._

I'm using scalapb, which requires that you don't do that

I'm hunting for a way to disable that automatic import now, or "unimport" or "shadow" those modules after they get imported.

user961826
  • 564
  • 6
  • 14
  • 1
    There are ways to debug implicits (at compile time) https://stackoverflow.com/questions/59348301/in-scala-2-or-3-is-it-possible-to-debug-implicit-resolution-process-in-runtime/ If you're certain what implicits to use then you can resolve implicits manually and see new compile error. – Dmytro Mitin Dec 03 '22 at 08:37
  • @DmytroMitin Thanks! That is helpful. I added some more info. Any other thoughts? – user961826 Dec 03 '22 at 22:25
  • Did you have a chance to check whether "unimporting" like in my answer helps? – Dmytro Mitin Dec 13 '22 at 11:54
  • Do you have any updates? – Dmytro Mitin Dec 20 '22 at 03:41
  • Thanks for helping out! The solution below doesn't seem to work, sadly. I'm working around it by build a map and writing that out instead, but it's a pretty inefficient process – user961826 Jan 04 '23 at 16:31

1 Answers1

2

If spark.implicits._ are already imported then a way to "unimport" (hide or shadow them) is to create a duplicate object and import it too

object implicitShadowing extends SQLImplicits with Serializable {
  protected override def _sqlContext: SQLContext = ???
}

import implicitShadowing._

Testing for case class Person(id: Long, name: String)

// no import

List(Person(1, "a")).toDS() // doesn't compile, value toDS is not a member of List[Person]
import spark.implicits._

List(Person(1, "a")).toDS() // compiles
import spark.implicits._
import implicitShadowing._

List(Person(1, "a")).toDS() // doesn't compile, value toDS is not a member of List[Person]

How to override an implicit value?

Wildcard Import, then Hide Particular Implicit?

How to override an implicit value, that is imported?

How can an implicit be unimported from the Scala repl?

Not able to hide Scala Class from Import

NullPointerException on implicit resolution

Constructing an overridable implicit

Caching the circe implicitly resolved Encoder/Decoder instances

Scala implicit def do not work if the def name is toString

Is there a workaround for this format parameter in Scala?

Please check whether this helps.

Possible problem can be that you don't want just to unimport spark.implicits._ (scalapb.spark.Implicits._), you probably want to import scalapb.spark.ProtoSQL.implicits._ too. And I don't know whether implicitShadowing._ shadow some of them too.

Another possible workaround is to resolve implicits manually and use them explicitly.

Dmytro Mitin
  • 48,194
  • 3
  • 28
  • 66