2

I am working on spark for a while. Recently I came across some strange scenario for which I am trying to find out the root cause.

I have a doubt

Different output with .setMaster("local[*]") and .setMaster("local[3]") ?

As per my current understanding * dynamically allocates the cores from the local system and in later case we are manually giving the cores for the execution of program.

My problem is whenever I am giving *, I am getting some undesired results. When I run the same code by giving cores manually, it gives perfect result.

I am running application on 4 core CPU.

Since people are marking it as duplicate, I'll try to explain more deeply. I have a RDD of id and timestamp so, what I am trying to achieve is finding a gap of more then 15 minutes interval between two consecutive rows. Using, following code :-

 val lists = rdd.zipWithIndex().map(p => {

 if (p._2 == 0) {
      moveLastGpsdt = p._1.gpsdt
      imei = p._1.imei
 } else if (p._2 > 0) {
      val timeDiffs = p._1.gpsdt.getTime() - moveLastGpsdt.getTime()
      if (p._1.imei.equals(imei) && timeDiffs > 900000L) {
        println("Unreachable " + moveLastGpsdt + " " + p._1.gpsdt)
      Arrayimeistoppage = events(p._1.imei, "Unreachable", moveLastGpsdt,p._1.gpsdt)
      }
Arrayimeistoppage
})

Now, I have a set of records. When I run with "local[star]" , it skips some data but whereas If I use local[1]/local[2]/local[3], it gives proper result with all rows. I checked with rdd.partition I get 4 partition in case of local[*]. I have 4 core CPU , but as per my understanding 1 core is used by OS and other cores can only be used for processing.So, How can we get 4 cores in case of local[star], maximum should be 3 right ?

Pinnacle
  • 165
  • 2
  • 14

2 Answers2

1

When you use local[*] , spark will use all cores on driver. When you specified local[3], spark will use only 3 cores.

merenptah
  • 476
  • 4
  • 15
  • Yes , you are right . I understand this but why it gives different result set on execution. Please re read my ques. I have tried to explain my query more. – Pinnacle Sep 10 '18 at 08:59
1

when you set local[*] it will use all the available cores in your machine I.e each cores can open up a new thread for data processing. local[3] means it will use only 3 cores and can only execute 3 parallel tasks a t a time. It’s always advisable instead of giving * oversubscribe the value. If your machine has octacore overaubscribe to 12 or more. If your core is hyper threaded then it can use extra cores

Chandan Ray
  • 2,031
  • 1
  • 10
  • 15