1

I have a dataset where I want to perform some supervised task (regression, decision tree). The dataset is here: http://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29

The data are in plain text file in the format

Data1 Data2 Data3 .....

I checked the Spark Mlib tutorials from https://spark.apache.org/docs/latest/mllib-decision-tree.html and they used data in libSVM format so they used loadLibSVMFile function.

My question is this: now that I don't have this format how can I load the data and pass the labels for the variables. Which method should I use?

I checked MLutil data documentation and no method strike me as the one I need https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/mllib/util/MLUtils.html

Thanks in advance

Michail N
  • 3,647
  • 2
  • 32
  • 51

0 Answers0