I want to know how transient variables are available on the workers. For example:- A map task command is sent from the driver to an executor by serializing the MapFunction object. The executor deserializes the command, and executes it on a partition. Now if in that mapFunction i use a transient variable, how is it available on the workers, as it is not serialized and sent to the workers.
Also in the example of following link https://www.mapr.com/blog/how-log-apache-spark
Example:
Class Test{
transient static SparkSession sparkSession;
public static void main(String[] args){
sparkSession = //Initialize SparkSession
Dataset<Row> dataset = sparkSession.read().csv("A.csv");
dataset.createOrReplaceTempView("TEMP_TABLE");
Dataset<Row> dataset2 = sparkSession.sql("SELECT * FROM TEMP_TABLE");
Dataset<String> stringDataset = dataset2.map((MapFuction<Row,String>) (row)->{
Dataset<Row> tempDataset = sparkSession.sql("SELECT NAME FROM TEMP_TABLE WHERE ID='" + row.getString(0) + "'");
String temp = tempDataset.first().getString(0);
return temp;
},Encoders.STRING());
stringDataset.show();
}
}
In above example how was sparkSession resolved on workers, as it was created on driver and while sending the closure to workers sparkSession was not sent as it was not serialized so shouldn't it be null on workers but it was not. Why?
As sparkSession is a static variable so it is stored in the class definition, so when that closure is sent to the workers, Is the Test class definition also sent to the workers with the serialized closure ?