4

On Spark 1.6.2 (Scala 2.10.5) the following code worked just fine in the shell:

import org.apache.spark.mllib.linalg.Vector
case class DataPoint(vid: String, label: Double, features: Vector)

The mllib Vector overshadowed the Scala Vector correctly.

However, on Spark 2.0 (Scala 2.11.8) the same code throws the following error in the shell:

<console>:11: error: type Vector takes type parameters
  case class DataPoint(vid: String, label: Double, features: Vector)

In order to make it work, I now have to name the class explicitly:

case class DataPoint(vid: String, label: Double,
  features: org.apache.spark.mllib.linalg.Vector)

Can someone please tell me what changed, and is Spark or Scala at fault here? Thanks!

Roman
  • 129
  • 1
  • 9
  • 2
    They changed the way spark shell does imports, and there are outstanding bugs for it. Are you talking about running from shell? – som-snytt Sep 16 '16 at 21:46
  • @som-snytt yes I'm running from shell - thanks - updated the question. Okay so it is most likely a bug then. – Roman Sep 16 '16 at 21:50

1 Answers1

4

The simplest solution is to this problem is a simple paste:

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.1.0-SNAPSHOT
      /_/

Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_102)
Type in expressions to have them evaluated.
Type :help for more information.

scala> import org.apache.spark.mllib.linalg.Vector
import org.apache.spark.mllib.linalg.Vector

scala> case class DataPoint(vid: String, label: Double, features: Vector)
<console>:11: error: type Vector takes type parameters
       case class DataPoint(vid: String, label: Double, features: Vector)
                                                                  ^

scala> :paste
// Entering paste mode (ctrl-D to finish)

import org.apache.spark.mllib.linalg.Vector
case class DataPoint(vid: String, label: Double, features: Vector)

// Exiting paste mode, now interpreting.

import org.apache.spark.mllib.linalg.Vector
defined class DataPoint
zero323
  • 322,348
  • 103
  • 959
  • 935
  • thank you @zero323 - your solution does work! could you please also elaborate on what makes it work? – Roman Sep 16 '16 at 23:38
  • 2
    The difference compared to working line by line is that a whole block is compiled together. You could basically do the same thing by putting everything in the same block like `{import ....; case class DataPoint(...)}` (I know, not useful) or wrap with a single objects. But if you ask how to fix this upstream I have no idea. Spark tinkers with shell in serious ways and there quite a few ugly bugs there including [case class monster](http://stackoverflow.com/q/35301998/1560062). – zero323 Sep 17 '16 at 10:40