I would like to apply a regression to my data. One of the workflow is the prepare my data as a JavaRDD starting from a Dataset with its header. So, what I did was the following:
== Step 1: transform the Dataset into JavaRDD
JavaRDD<Row> dataPointsWithHeader =modelDS.toJavaRDD();
== Step 2: take the first row (I was thinking that it was the header)
Row header= dataPointsWithHeader.first();
== Step 3: eliminate the row header by
JavaRDD<Row> dataPointsWithoutHeader = dataPointsWithHeader.filter((Row row) -> {
return !row.equals(header);
});
The issue with the above approach are:
a) the result of the Step 2 is not the header row;
b) the application of the Step 3 is very inefficient in case there is a way to access to the header.
My question is:
Is there an efficient way to access to the header and eliminate it ?
Many Thanks in advance for your help and suggestion.
Regards, Carlo