1

I would like to apply a regression to my data. One of the workflow is the prepare my data as a JavaRDD starting from a Dataset with its header. So, what I did was the following:

== Step 1: transform the Dataset into JavaRDD

    JavaRDD<Row> dataPointsWithHeader =modelDS.toJavaRDD();

== Step 2: take the first row (I was thinking that it was the header)

Row header= dataPointsWithHeader.first();

== Step 3: eliminate the row header by

JavaRDD<Row> dataPointsWithoutHeader = dataPointsWithHeader.filter((Row row) -> {
            return !row.equals(header);
        });

The issue with the above approach are:

a) the result of the Step 2 is not the header row;

b) the application of the Step 3 is very inefficient in case there is a way to access to the header.

My question is:

Is there an efficient way to access to the header and eliminate it ?

Many Thanks in advance for your help and suggestion.

Regards, Carlo

Carlo Allocca
  • 591
  • 1
  • 7
  • 19

1 Answers1

0

This should be super simple.

if you need to header for the given Data-set, when you can use in build functions.

String[] header = modelDS.columns();

you can directly operate all your operation by operating in top of your modelDS variable itself.

sriramkumar
  • 144
  • 2
  • 15