Spark uses only one core when using forEachRemaining

Question

I'm still new to Spark. I wrote this code to parse large string list.

I had to use forEachRemaining because I had to initialize some non-serializable objects in each partition.

    JavaRDD<String> lines=initRetriever();
    lines.foreachPartition(iter->{
        Obj1 obj1=initObj1()
        MyStringParser parser=new MyStringParser(obj1);
        iter.forEachRemaining(str->{
            try {
                parser.parse(str);
            } catch (ParsingException e) {
                e.printStackTrace();
            }
        });

        System.out.print("all parsed");
        obj1.close();
    });

I believe Spark is all about parallelism. But this program uses only a single thread on my local machine. Did I do something wrong? Missing configuration? Or maybe the iter doesn't allow it to execute all in parallel.

EDIT

I have no configuration files for Spark.

That's how I initialize Spark

SparkConf conf = new SparkConf()
                .setAppName(AbstractSparkImporter.class.getCanonicalName())
                .setMaster("local");

I run it on IDE and using mvn:exec command.

Could you please add this as an answer? – Mohamed Taher Alrefaie Mar 02 '16 at 21:03 — Mohamed Taher Alrefaie, Mar 02 '16 at 21:03

score 0 · Accepted Answer · answered Jan 31 '18 at 10:15

0

As @Alberto-Bonsanto indicated, using local[*] triggers Spark to use all available threads. More info here.

SparkConf conf = new SparkConf()
                .setAppName(AbstractSparkImporter.class.getCanonicalName())
                .setMaster("local[*]");

answered Jan 31 '18 at 10:15

Mohamed Taher Alrefaie

15,698
9
48
66

Spark uses only one core when using forEachRemaining

EDIT

1 Answers1