I tried to debug a very simple Spark scala word count program. Since spark is "lazy" so I think I need to put the break point at an "action" statement and then run that line of code, then I'll be able to check those RDD variables before that statements and look at their data. So I put a break point at line 14, when debugging gets there, I hit step over to run line 14. However after doing that, I cannot see/find any data for varaibles text1, text2 in the debug session variable view.(But I can see data inside the "all" variable in the debug view though). Am I doing this right? Why I cannot see data in the text1/text2 variables ?
Suppose my wordCount.txt is like this:
This is a text file with words aa aa bb cc cc
I expect to see (aa,2),(bb,1),(cc,2)
etc somewhere in text2 variable view. But I don't find anything like that in there. See screen shot below the codes.
I am using eclipse Neon and Spark2.1 and it is a eclipse local debug session. Your help would be really appreciated as I cannot get any info after extensive search. Here's my code:
package Big_Data.Spark_App
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
object WordCount {
def main(args: Array[String]){
val conf=new SparkConf().setAppName("WordCountApp").setMaster("local")
val sc = new SparkContext(conf)
val text = sc.textFile("/home/cloudera/Downloads/wordCount.txt")
val text1 = text.flatMap(rec=>rec.split(" ")).map(rec=>(rec,1))
val text2 = text1.reduceByKey( (v1,v2)=>v1+v2).cache
val all = text2.collect() //line 14
all.foreach(println)
}
}
Here's the debug variable view shows that no actual data in text2 variable