Sorry for the confusion in the initial question. Here is a questions with the reproducible example:
I have an rdd of [String]
and I have a rdd of [String, Long]
. I would like to have an rdd of [Long]
based on the match of String
of second with String
of first. Example:
//Create RDD
val textFile = sc.parallelize(Array("Spark can also be used for compute intensive tasks",
"This code estimates pi by throwing darts at a circle"))
// tokenize, result: RDD[(String)]
val words = textFile.flatMap(line => line.split(" "))
// create index of distinct words, result: RDD[(String,Long)]
val indexWords = words.distinct().zipWithIndex()
As a result, I would like to have an RDD with indexes of words instead of words in "Spark can also be used for compute intensive tasks"
.
Sorry again and thanks