i use spark-shell to run a simple code:
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val test_data = sqlContext.read.json("music.json")
test_data.registerTempTable("test_data")
val temp1 = sqlContext.sql("select user.id_str as userid, text from test_data")
val temp2 = temp1.map(t => (t.getAs[String]("userid"),t.getAs[String]("text").split('@').length-1))
Until here,it is all right. Then i want to save the result:
temp2.saveAsTextFile("test")
Then it comes out:
16/05/18 20:05:14 ERROR Executor: Exception in task 1.0 in stage 15.0 (TID 23)
java.lang.NullPointerException
at scala.collection.immutable.StringLike$class.split(StringLike.scala:201)
at scala.collection.immutable.StringOps.split(StringOps.scala:31)
at $line62.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:31)
at $line62.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:31)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply$mcV$sp(PairRDDFunctions.scala:1198)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1197)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1197)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1250)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1205)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1185)
I have little experience using java. I wonder whether the problem is from my scala code or there is something wrong with config?