-1

I tried simple example on spark 2.1cloudra2:

val flightData2015 = spark
  .read
  .option("inferSchema", "true")
  .option("header", "true")
  .csv("/2015-summary.csv")

but when I check spark shell UI,I found it generate three jobs: enter image description here

I think every action should related to a job,am I right? I do some experiment found out every option can generate a job. Does option act like action? please help understand this situation.

yuxh
  • 924
  • 1
  • 9
  • 23
  • 1
    [Why does SparkSession execute twice for one action?](https://stackoverflow.com/q/38924623/10465355) – 10465355 Dec 21 '18 at 13:08

1 Answers1

-2

@yuxh,its because of the defaultMinPartitions which have been set to 3.It reflects Parallelism when a spark job is executed.You can change it in yarn-site.xml globally or dynamically specific to a job by issuing sqlContext.setConf("spark.sql.shuffle.partitions", "your value”)

Subash
  • 887
  • 1
  • 8
  • 19
  • I don't think so,I can reduce job by deleting option function and job has nothing to do with Parallelism – yuxh Dec 21 '18 at 11:51
  • What was the reason for downvote? Did you try executing the settings and started the spark job? – Subash Dec 21 '18 at 16:28