1

In a scala REPL the following code

import scala.beans.BeanProperty

class EmailAccount {
  @scala.beans.BeanProperty var accountName: String = null

  override def toString: String = {
    return s"acct ($accountName)"
  }
}
classOf[EmailAccount].getDeclaredConstructor()

results in

res0: java.lang.reflect.Constructor[EmailAccount] = public EmailAccount()

however in spark's REPL I get

java.lang.NoSuchMethodException: EmailAccount.<init>()
  at java.lang.Class.getConstructor0(Class.java:2810)
  at java.lang.Class.getDeclaredConstructor(Class.java:2053)
  ... 48 elided

What causes this discrepancy? How can I get spark to match the behavior of the spark shell.

I launched the REPLs like so:

/home/placey/Downloads/spark-2.0.0-bin-hadoop2.7/bin/spark-shell --master local --jars /home/placey/snakeyaml-1.17.jar

and

scala -classpath "/home/placey/snakeyaml-1.17.jar

Scala versions are spark:

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_55)

scala:

Welcome to Scala version 2.11.6 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_55).
placeybordeaux
  • 2,138
  • 1
  • 20
  • 42

1 Answers1

3

Actually, this isn't specific to scala.beans.BeanProperty or even Spark. You can get the same behaviour in standard Scala REPL by running it with -Yrepl-class-based parameter:

scala -Yrepl-class-based

Now, let's try defining a simple empty class:

scala> class Foo()
defined class Foo

scala> classOf[Foo].getConstructors
res0: Array[java.lang.reflect.Constructor[_]] = Array(public Foo($iw))

scala> classOf[Foo].getFields
res1: Array[java.lang.reflect.Field] = Array(public final $iw Foo.$outer)

As you can see, the REPL modified your class on the fly by adding additional field and parameter to the constructor. Why?

Whenever you create a val or var in Scala REPL, it gets wrapped in a special object, because there's no such thing as "global variables" in Scala. See this answer.

Normally, this is an object, so it's available globally. However, with -Yrepl-class-based the REPL uses class instances instead of a single global object. This feature was introduced by Spark developers because Spark needs classes to be serializable so they can be sent to a remote worker (see this pull request).

Because of this, any class you define in the REPL needs to get the $iw instance. Otherwise you wouldn't be able to access global vals and vars which you defined in the REPL. Additionally, the generated class automatically extends Serializable.

I'm afraid you can't do anything to prevent this. spark-shell enables -Yrepl-class-based by default. Even if there was an option for disabling this behaviour, you would run into many other problems because your classes would no longer be serializable, but Spark needs to serialize them.

Community
  • 1
  • 1