0

i use spark-shell to run a simple code:

 val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val test_data = sqlContext.read.json("music.json")
test_data.registerTempTable("test_data")
val temp1 = sqlContext.sql("select user.id_str as userid, text from test_data")
val temp2 = temp1.map(t => (t.getAs[String]("userid"),t.getAs[String]("text").split('@').length-1))

Until here,it is all right. Then i want to save the result:

temp2.saveAsTextFile("test")

Then it comes out:

    16/05/18 20:05:14 ERROR Executor: Exception in task 1.0 in stage 15.0 (TID 23)
java.lang.NullPointerException
    at scala.collection.immutable.StringLike$class.split(StringLike.scala:201)
    at scala.collection.immutable.StringOps.split(StringOps.scala:31)
    at $line62.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:31)
    at $line62.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:31)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply$mcV$sp(PairRDDFunctions.scala:1198)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1197)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1197)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1250)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1205)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1185)

I have little experience using java. I wonder whether the problem is from my scala code or there is something wrong with config?

ZMath_lin
  • 523
  • 2
  • 6
  • 14
  • A hint: `t.getAs[String]("text").split('@').length-1` is not even remotely safe. – zero323 May 18 '16 at 12:46
  • I'd like to count the @ number in a string. What is the better to do it? – ZMath_lin May 18 '16 at 13:01
  • For starters make sure that null values are handled correctly. – zero323 May 18 '16 at 13:03
  • There are two possible reasons of NPE. 1. An empty line at the end of music.json file. sqlContext.read.json tries to read all lines when the action is called on the dataframe. 2. "text" is null or not present in one of the rows of music.json. {"user": {"id_str":"2"}, "text": null} OR {"user": {"id_str":"3"}} – Pranav Shukla May 18 '16 at 13:04

0 Answers0