I have this one big flat file with three types of data("M","C","Q") inside it. I am creating one RDD using
val inputRDD=sc.textFile("/user/train");
I am filtering the data by applying three transformation on inputRDD.
val metaRDD=inputRDD.filter(line=>line.contains("M"));
val clickRDD=inputRDD.filter(line=>line.contains("C"));
val QueryRDD=inputRDD.filter(line=>line.contains("Q"));
This will read the entire file three times when we use three rdds in a action. Is there a way to get three RDDs by applying one transformation on inputRDD and reading the file only once.
I am aware of the fact that file will be read only once if we persist the data set. But the file to is too large to be persisted.