I have a doubt to understand what is client mode and cluster mode. Let take an example:
I have test.py with the following:
if __name__ == "__main__":
conf = (SparkConf()
.setAppName(appName)
.set("spark.executor.memory", ?)
.set('spark.driver.memory', ?)
.set('spark.executor.memoryOverhead',?)
.set("spark.network.timeout", ?)
.set("spark.files.overwrite", ?)
.set("spark.executor.heartbeatInterval", ?)
.set("spark.driver.maxResultSize", ?)
.set("spark.executor.instances", ?)
.set("spark.executor.cores", ?)
.set("spark.driver.cores", ?)
.set("spark.sql.shuffle.partitions", ?)
)
spark = SparkSession.builder.config(conf=conf).getOrCreate()
start_time = time.time()
sc = spark.sparkContext
sqlContext = SQLContext(sparkContext = sc)
Im working on an SSH linux server. To be able tu run test.py, I can do two options:
1- Reserve a node using the following command:
salloc --time=03:00:00 --cpus-per-task=32 --mem=0 --account=def-myName
This command allows me to reserve a node for three hours. This node has the following specification:
Cores: 32
Available memory: 125 gb
CPU type: 2 x Intel E5-2683 v4 "Broadwell" @ 2.1Ghz
Storage: 2 x 480GB SSD
now to run test.py, I just type spark-submit test.py
. Does this way is called client mode or cluster mode? If it is client mode how I can set:
Master Memory:
Master Cores:
Number of Worker Nodes:
Memory per worker node (gb):
Cores per worker node:
2- I can run a job.sh
where It is defined as follows:
#SBATCH --nodes=1
#SBATCH --time=
#SBATCH --mem=128000M
#SBATCH --cpus-per-task=
#SBATCH --ntasks-per-node=
#SBATCH --output=sparkjob-%j.out
#SBATCH --mail-type=ALL
#SBATCH --error=
## send mail to this address
#SBATCH --mail-user=
spark-submit --total-executor-cores xxx --driver-memory xxxx test.py
....
Then I execute the code by sbatch job.sh
. Does this way is called cluster way?