We are trying to run the job on spark databricks with Azure but getting the NoSuchMethodError: org.apache.spark.sql.Dataset.exprEnc()Lorg/apache/spark/sql/catalyst/encoders/ExpressionEncoder; error. We are using Databricks 10.4LTS version with spark 3.2.0-SNAPSHOT. Please find below the concerned code block.
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import org.apache.spark.api.java.function.MapPartitionsFunction;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
public class Test1 implements MapPartitionsFunction<Row, Row>{
private void doTest() {
SparkSession session = SparkSession.builder().appName("DgDB").master("local[*]")
.getOrCreate();
Dataset<Row> dataSet = session.read().option("charset", "UTF-8")
.format("text").load("<filePath>");
Dataset<Row> singlePartition = dataSet.mapPartitions(this, dataSet.exprEnc()).repartition(1);
}
public static void main(String[] args) {
System.out.println("helloo");
Test1 test= new Test1();
test.doTest();
}
@Override
public Iterator<Row> call(Iterator<Row> input) throws Exception {
while (input.hasNext()) {
Row row = input.next();
List<Object> columns = new ArrayList<>();
for (int i = 0; i < row.length(); i++) {
columns.add(row.get(i));
}
System.out.println("rowss: "+columns);
}
return null;
}
}
Furthermore, I have tried to find the jar version from where Dataset class is being loaded and I got "file:/databricks/jars/----workspace_spark_3_2--sql--core--core-hive-2.3__hadoop-3.2_2.12_deploy.jar". However, I am not able to track the path of this jar. From where this jar is loading ?
Could anyone please help to fix this issue
The same code block was working fine on databricks 7.4 version with Spark 2.x.