i Have a RDD(K, Iterable[V])
how can convert it to RDD(K, RDD[V]) without collect in driver
-------------------edited again----------------------------------------------
val dataList = DataLoader.loadTrainTestData(hiveContext.sql(sampleDataHql)).collect().map(ds => (ds._1, sc.parallelize(ds._2.toSeq)))
//Train Model and Test
val resultData = sc.parallelize(dataList).map{ ds =>
val remark = ds._1
val data = ds._2.randomSplit(Array(0.6, 0.4), seed = 11L)
val model = new LogisticRegressionWithLBFGS().setNumClasses(2).setIntercept(true).run(data(0))
val trainAUC = ModelTester.getAUC(data(1), model)
val modelWeight = model.weights.toArray.map(_.toString).reduce(_ + "_" + _)
modelWeight + "|" + model.intercept.toFloat + "|" + model.numClasses+ "|" + model.numFeatures + "|" + trainAUC.toFloat + "|" + remark
}
i haved created a nested RDD, but it will collect and re-parallelize in driver, take up too much memory.
-------------------edited again----------------------------------------------
or how can i convert RDD(K, Iterable[V]) to Array(K, RDD[V]) without collect on driver
or what is the better way to run multi-dataset
-------------------edited again----------------------------------------------
nested RDD is not allowed indeed! thanks yours!
and if there is a way to create Array(K, RDD[V]) without collect in driver?