How to use multicore in a local spark instalation

Question

I have some problems when running a spark script in a multicore machine. I am on a Ubuntu 16.04 machine. I've installed spark as follows: (1) I've download spark-2.4.0-bin-hadoop2.7.tgz and unpacked it on /usr/local/spark. I've added spark's bin folder to PATH by adding "export PATH=/usr/local/spark/bin:$PATH" into .bashrc file. I've installed java using apt-get install: apt-get install oracle-java8-installer and apt-get install oracle-java8-set-default
(2) I am using the following script:

from pyspark import SparkContext, SparkConf
if __name__ == "__main__":
    conf = SparkConf().setAppName("word count").setMaster("local[*]")
    sc = SparkContext(conf = conf)

    lines = sc.textFile("/path/to/file/word_count.txt")

    words = lines.flatMap(lambda line: line.split(" "))

    wordCounts = words.countByValue()
    executor_count = len(sc._jsc.sc().statusTracker().getExecutorInfos()) - 1
    cores_per_executor = int(sc.getConf().get('spark.executor.cores','1'))

    #for word, count in wordCounts.items():
    #    print("{} : {}".format(word, count))

    print("Number of executors      :" + str(executor_count))
    print("Cores per executor       :" + str(cores_per_executor))

(3) when running it using:

$ spark-submit WordCount.py

I get:

18/11/17 10:42:12 WARN Utils: Your hostname, se8-HVM-domU resolves to a loopback address: 127.0.1.1; using 192.168.91.61 instead (on interface eth0)
18/11/17 10:42:12 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
18/11/17 10:42:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Number of executors      :0
Cores per executor       :1

(4) The number of the cores of the machine is 32.

# cat /proc/cpuinfo | grep processor | wc -l
32

If I am correctly understanding the output in (3), the script is using just 1 core - even thought I configured the SparkContext to "local[*]" (uses as many cores as avaliable).
Now the simple question: how to make the script uses more than 1 core ?

Thanks @user10465355. The link you post answered my questions. — ricksant, Nov 17 '18 at 13:32

How to use multicore in a local spark instalation

0 Answers0