0

I am trying to run some algorithm in apache Spark. I am getting Java - A master URL must be set in your configuration error even if I set the configuration.

SparkSession spark = SparkSession.builder().appName("Sp_LogistcRegression").config("spark.master", "local").getOrCreate();

This is the code I work with

import java.io.FileOutputStream;
import java.io.IOException;
import java.io.ObjectOutputStream;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.ml.classification.LogisticRegression;
import org.apache.spark.ml.classification.LogisticRegressionModel;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.mllib.util.MLUtils;

public class Sp_LogistcRegression {
    public void trainLogisticregression(String path, String model_path) throws IOException {
        //SparkConf conf = new SparkConf().setAppName("Linear Regression Example");


    //  JavaSparkContext sc = new JavaSparkContext(conf);
        SparkSession spark = SparkSession.builder().appName("Sp_LogistcRegression").config("spark.master", "local").getOrCreate();
        Dataset<Row> training =  spark.read().option("header","true").csv(path);
         System.out.print(training.count());

        LogisticRegression lr = new LogisticRegression().setMaxIter(10).setRegParam(0.3);

        // Fit the model
        LogisticRegressionModel lrModel = lr.fit(training);
        lrModel.save(model_path);



        spark.close();

    }

}

This is my test case:

import java.io.File;

import org.junit.Test;

public class Sp_LogistcRegressionTest {
    Sp_LogistcRegression spl =new Sp_LogistcRegression ();




    @Test
        public void test() {

            String filename = "datas/seg-large.csv";
            ClassLoader classLoader = getClass().getClassLoader();
            File file1 = new File(classLoader.getResource(filename).getFile());
            spl. trainLogisticregression( file1.getAbsolutePath(), "/tmp");

        }

    }

Why I am getting this error? I checked the solutions here Spark - Error "A master URL must be set in your configuration" when submitting an app It does n´t work. Any clues ?

Kumaresp
  • 35
  • 1
  • 12

1 Answers1

4

your

SparkSession spark = SparkSession.builder().appName("Sp_LogistcRegression").config("spark.master", "local").getOrCreate();

should be

SparkSession spark = SparkSession.builder().appName("Sp_LogistcRegression").master("local").getOrCreate();

Or

when you run, you need to

spark-submit --class mainClass --master local yourJarFile
Ramesh Maharjan
  • 41,071
  • 6
  • 69
  • 97
  • org.apache.hadoop.util.NativeCodeLoader : Unable to load native-hadoop library for your platform... using builtin-java classes where applicable – Kumaresp Jun 28 '17 at 10:27
  • Thats just warning . see https://stackoverflow.com/questions/19943766/hadoop-unable-to-load-native-hadoop-library-for-your-platform-warning – Ramesh Maharjan Jun 28 '17 at 11:51
  • How to remove that warning ? – Kumaresp Jun 28 '17 at 12:48
  • hadoop distributions are for 32 bit computers. you will have to build hadoop using 64 bit native libraries. check http://kiwenlau.blogspot.com/2015/05/steps-to-compile-64-bit-hadoop-230.html and https://dataheads.wordpress.com/2013/12/10/hadoop-2-setup-on-64-bit-ubuntu-12-04-part-3/ – Ramesh Maharjan Jun 28 '17 at 12:54