I want to declare a function to get the cogroup
of two RDD
. Actually it's a interSectionByKey
. The code below can't be compiled:
def getRetain[K, V](activeUserRdd : RDD[(K, V)], newUserRdd : RDD[(K, V)]): RDD[(K, V)] ={
activeUserRdd.cogroup(newUserRdd).flatMapValues{
x => Option((if (!x._1.isEmpty && !x._2.isEmpty) x._2.head else null).asInstanceOf[V])
}
}
Error:
value cogroup is not a member of org.apache.spark.rdd.RDD[(K, V)]
I think (K, V)
miss matched the real [(K, V)]
declared in cogroup
, but which is the right way to declare in my function?