Me new to spark , in our project we are using spark-structured streaming to write kafka consumer. We have a use case where I need to modular the code so that multiple people can work on different pieces of spark-job simultaneously.
In first step we read different kafka topics now i have two datasets. Lets say ds_input1 and ds_input2.
I need to pass these to next step where other person working on. So i have done as below in java8
DriverClass{
Dataset<Row> ds_input1 = //populate it from kafka topic
Dataset<Row> ds_output1 = null;
SecondPersonClass.process(ds_input1 , ds_output1 );
//here outside I get ds_output1 as null
//Why it is not working as List<Objects> in java ?
//Is there anything wrong I am doing ? what is the correct way to do?
Dataset<Row> ds_output2 = null;
ThirdPersonClass.process(ds_output1 , ds_output2);
//here outside I get ds_output2 as null
//though ds_output2 populated inside function why it is still null outside?
}
SecondPersonClass{
static void process(ds_input1 , ds_output1 ){
//here have business logic to work on ds_input1 data.
//then i will update and assign it back to out put dataSets
//i.e. ds_output1
//for simplicity lets says as below
ds_output1 = ds_input1 ;
//here I see data in ds_output1 i.e ds_output1 is not null
}
}
ThirdPersonClass{
static void process2(ds_input2 , ds_output2 ){
//here have business logic to work on ds_input2 data.
// then i will update and assign it back to out put dataSets
//i.e. ds_output2
//for simplicity lets says as below
ds_output2 = ds_input2 ;
//here I see data in ds_output2 i.e ds_output2 is not null
}
}
Question : Even though dataset is populated inside the function static method why those are not reflecting outside the function and still null? Why java call by reference to objects not working here ? How to handle this ?
Can we return multiple Datasets from a function if so how to do it ?