Just getting started with Spark and Scala. We've installed Spark 2 on our dev cloudera hadoop cluster, and I'm using spark2-shell. I'm going through a book to learn some basics. It's examples show println(foo) working without doing a collect, but that's not working for me:
scala> val numbers = sc.parallelize(10 to 50 by 10)
numbers: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[9] at parallelize at <console>:24
scala> numbers.collect().foreach(println)
10
20
30
40
50
scala> numbers.foreach(x => println(x))
scala>
As you can see, nothing prints unless I do a collect().
What's going on, is the book wrong, or is something funny with my spark/scala/config?
Version Info:
Spark version 2.0.0.cloudera2
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_111)