I want to use a md5 function to RDD[(String,Array[Double])], but there is an Error of Null pointer exception. And I found the question on stack overflow.call of distinct and map together throws NPE in spark library.
my code:
def md5(s: String) = {
MessageDigest.getInstance("MD5").digest(s.getBytes).
map("%02x".format(_)).mkString.substring(0,8)
}
val rdd=sc.makeRDD(Array(1,8,6,4,9,3,76,4))//.collect().foreach(println)
val rdd2 = rdd.map(r=>(r+"s",Array(1.0,2.0)))
rdd2.map{
case(a,b) => (md5(a)+"_"+a,b)
}.foreach(println)
in the local mode, it's ok, but in the cluster mode, it's error.
java.lang.NullPointerException
Can I have another way to do this? thx :)
error:
Exception in thread "main" java.lang.NullPointerException
at no1.no1$.no1$no1$$md5$1(no1.scala:139)
at no1.no1$$anonfun$8.apply(no1.scala:143)
at no1.no1$$anonfun$8.apply(no1.scala:141)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at no1.no1$.main(no1.scala:141)
at no1.no1.main(no1.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
the code above is an example, but this code seems to be right. I am confused.