I am working on spark for a while. Recently I came across some strange scenario for which I am trying to find out the root cause.
I have a doubt
Different output with
.setMaster("local[*]")
and.setMaster("local[3]")
?
As per my current understanding * dynamically allocates the cores from the local system and in later case we are manually giving the cores for the execution of program.
My problem is whenever I am giving *, I am getting some undesired results. When I run the same code by giving cores manually, it gives perfect result.
I am running application on 4 core CPU.
Since people are marking it as duplicate, I'll try to explain more deeply. I have a RDD of id and timestamp so, what I am trying to achieve is finding a gap of more then 15 minutes interval between two consecutive rows. Using, following code :-
val lists = rdd.zipWithIndex().map(p => {
if (p._2 == 0) {
moveLastGpsdt = p._1.gpsdt
imei = p._1.imei
} else if (p._2 > 0) {
val timeDiffs = p._1.gpsdt.getTime() - moveLastGpsdt.getTime()
if (p._1.imei.equals(imei) && timeDiffs > 900000L) {
println("Unreachable " + moveLastGpsdt + " " + p._1.gpsdt)
Arrayimeistoppage = events(p._1.imei, "Unreachable", moveLastGpsdt,p._1.gpsdt)
}
Arrayimeistoppage
})
Now, I have a set of records. When I run with "local[star]" , it skips some data but whereas If I use local[1]/local[2]/local[3], it gives proper result with all rows. I checked with rdd.partition I get 4 partition in case of local[*]. I have 4 core CPU , but as per my understanding 1 core is used by OS and other cores can only be used for processing.So, How can we get 4 cores in case of local[star], maximum should be 3 right ?