Load Data for Machine Learning in Spark

Asked Sep 05 '17 at 07:31

Active Sep 05 '17 at 07:31

Viewed 78 times

I have a dataset where I want to perform some supervised task (regression, decision tree). The dataset is here: http://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29

The data are in plain text file in the format

Data1 Data2 Data3 .....

I checked the Spark Mlib tutorials from https://spark.apache.org/docs/latest/mllib-decision-tree.html and they used data in libSVM format so they used loadLibSVMFile function.

My question is this: now that I don't have this format how can I load the data and pass the labels for the variables. Which method should I use?

Thanks in advance

asked Sep 05 '17 at 07:31

Michail N

please look at https://stackoverflow.com/questions/29829531/making-an-input-text-file-as-a-training-data-set-in-libsvm and http://www.csie.ntu.edu.tw/~cjlin/libshorttext/doc/converter.html – Ramesh Maharjan Sep 05 '17 at 08:41
I can't do it natively with scala? – Michail N Sep 05 '17 at 08:44
1

check https://stackoverflow.com/questions/41416291/how-to-prepare-data-into-a-libsvm-format-from-dataframe – Ramesh Maharjan Sep 05 '17 at 08:47
thnx that is helpful indeed – Michail N Sep 05 '17 at 08:53

0 Answers0