Why does the accumulator variable in the following code does not print the aggregated string?
object mapRDD {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder()
.master("local")
.appName("sparkSessionName")
.getOrCreate()
spark.sparkContext.setLogLevel("WARN")
val data = Seq("Project",
"Gutenberg’s",
"Alice’s",
"Adventures",
"in",
"Wonderland")
val rdd = spark.sparkContext.parallelize(data)
var accumulator: String = "WHY IS THE AGGREGATED STRING NOT PRINTED?"
for (eachElementOfRDD <- rdd) {
accumulator = accumulator ++ eachElementOfRDD
}
println(accumulator)
}
}
Output 21/05/13 09:10:52 INFO BlockManagerMasterEndpoint: Registering block manager host.docker.internal:64786 with 894.3 MB RAM, BlockManagerId(driver, host.docker.internal, 64786, None) 21/05/13 09:10:52 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, host.docker.internal, 64786, None) 21/05/13 09:10:52 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, host.docker.internal, 64786, None) WHY IS THE AGGREGATED STRING NOT PRINTED?
I am a newbie to spark and scala and I understand the output of the following code is correct. What I want to know is the reason for such a behaviour. What this concept is called and some pointers to understand it.
EDIT The variable name accumulator was a co-incidence to the accumulator functionality of spark. I am concerned with adding a string to the original string until the loop is completed.