Databricks Azure: Getting issue NoSuchMethodError: org.apache.spark.sql.Dataset.exprEnc()Lorg/apache/spark/sql/catalyst/encoders/ExpressionEncoder;

Question

We are trying to run the job on spark databricks with Azure but getting the NoSuchMethodError: org.apache.spark.sql.Dataset.exprEnc()Lorg/apache/spark/sql/catalyst/encoders/ExpressionEncoder; error. We are using Databricks 10.4LTS version with spark 3.2.0-SNAPSHOT. Please find below the concerned code block.

import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

import org.apache.spark.api.java.function.MapPartitionsFunction;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;


public class Test1 implements MapPartitionsFunction<Row, Row>{
  

  private void doTest() {
    SparkSession session = SparkSession.builder().appName("DgDB").master("local[*]")        
        .getOrCreate();
      
   Dataset<Row> dataSet = session.read().option("charset", "UTF-8")
    .format("text").load("<filePath>");
    Dataset<Row> singlePartition = dataSet.mapPartitions(this, dataSet.exprEnc()).repartition(1);
    
  } 
  
  public static void main(String[] args) {

   System.out.println("helloo");
   Test1 test= new Test1();
   test.doTest();
  }

  @Override
  public Iterator<Row> call(Iterator<Row> input) throws Exception {
    while (input.hasNext()) {

      Row row = input.next();

      List<Object> columns = new ArrayList<>();
      for (int i = 0; i < row.length(); i++) {
        columns.add(row.get(i));
      }
      System.out.println("rowss: "+columns);
    }
    return null;
  }
}

Furthermore, I have tried to find the jar version from where Dataset class is being loaded and I got "file:/databricks/jars/----workspace_spark_3_2--sql--core--core-hive-2.3__hadoop-3.2_2.12_deploy.jar". However, I am not able to track the path of this jar. From where this jar is loading ?

Could anyone please help to fix this issue

The same code block was working fine on databricks 7.4 version with Spark 2.x.

I don't see any method with that name even in Spark 2.x. Maybe it was something added by Databricks? — Gaël J, Dec 28 '22 at 19:44
@GaëlJ I guess it's here https://github.com/apache/spark/blob/v3.3.1/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L235 — Dmytro Mitin, Dec 29 '22 at 01:54
@user20881825 https://stackoverflow.com/questions/35186/how-do-i-fix-a-nosuchmethoderror https://stackoverflow.com/questions/8168052/java-lang-nosuchmethoderror-when-the-method-definitely-exists https://stackoverflow.com/questions/59706633/nosuchmethoderror-java — Dmytro Mitin, Dec 29 '22 at 01:58
@GaëlJ Here is this method in output of `javap Dataset` https://gist.github.com/DmytroMitin/cdcfcbfc07648d0b9b2833c42b95007a#file-log-txt-L44 It's a getter to `implicit val`. — Dmytro Mitin, Dec 29 '22 at 02:09
@user20881825 Maybe some dependency issue. What is step-by-step reproduction? — Dmytro Mitin, Dec 29 '22 at 02:11
@GaëlJ It was also present in Spark 2.x https://github.com/apache/spark/blob/v2.4.8/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L208 — Dmytro Mitin, Dec 29 '22 at 02:15
Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. — Community, Dec 29 '22 at 06:49
We have a project through which we read the CSV files from Azure and perform some operation on the data. We are using Databricks-connects to submit the job to the databricks. We submit the job using JobAPI. We installed the jar (where the reading code is written) in the cluster libraries. When we submit the job with driver class (which is available in the installed jar), spark cluster starts the job and complete the read operations. The same code and flow was working as expected with Spark7.4 LTS. I have added few more details of the code in the above description. Please check — user20881825, Dec 29 '22 at 08:08
Good finding @Dmytro. I only looked at the javadoc. Haven't thought to check further. Thanks :) — Gaël J, Dec 29 '22 at 08:39
Furthermore, I have tried to find the jar version from where Dataset class is being loaded and I got "file:/databricks/jars/----workspace_spark_3_2--sql--core--core-hive-2.3__hadoop-3.2_2.12_deploy.jar". However, I am not able to track the path of this jar. From where this jar is loading ? — user20881825, Dec 29 '22 at 14:41
@user20881825 Can you try to do `javap -cp ----workspace_spark_......_deploy.jar org.apache.spark.sql.Dataset`? Is is similar to my above [gist](https://gist.github.com/DmytroMitin/cdcfcbfc07648d0b9b2833c42b95007a#file-log-txt-L44) i.e. does it contain `exprEnc` method? — Dmytro Mitin, Dec 29 '22 at 16:05
@user20881825 *"From where this jar is loading?"* It's hard to say without reproduction. My understanding is `/databricks/jars` is a standard directory https://learn.microsoft.com/en-us/azure/databricks/kb/libraries/replace-default-jar-new-jar https://stackoverflow.com/questions/64525464/adjust-classpath-change-spring-version-in-azure-databricks — Dmytro Mitin, Dec 29 '22 at 16:11
@DmytroMitin, I have downloaded the jar file ""----workspace_spark_3_2--sql--core--core-hive-2.3__hadoop-3.2_2.12_deploy.jar and verified the exprEnc method in it. The method is available the return type is different "public BaseExpressionEncoder exprEnc()" however, the expected return time should be "ExpressionEncoder". What are your thoughts on this? — user20881825, Dec 29 '22 at 16:56
@user20881825 `BaseExpressionEncoder` is weird. I can't find such class in Spark sources. Google doesn't know it. Is it just `BaseExpressionEncoder` without any package like `org.apache.spark.sql.catalyst.encoders` in `org.apache.spark.sql.catalyst.encoders.ExpressionEncoder`? Where does it come from? — Dmytro Mitin, Dec 31 '22 at 02:23
@user20881825 In principle, wrong return type can lead to `NoSuchMethodError` https://stackoverflow.com/questions/1134054/changing-return-type-of-method-gives-java-lang-nosuchmethoderror — Dmytro Mitin, Dec 31 '22 at 02:32
The class "BaseExpressionEncoder" exists in spark-catalyst_2.12-3.0.1-SNAPSHOT.jar which implements encoders. — user20881825, Jan 02 '23 at 10:00
@user20881825 Well, I don't know what `spark-catalyst_2.12-3.0.1-SNAPSHOT.jar` is but in [spark-catalyst_2.12-3.0.1.jar](https://repo1.maven.org/maven2/org/apache/spark/spark-catalyst_2.12/3.0.1/) I can't find class `BaseExpressionEncoder` (what is the reason to use `SNAPSHOT` if there is final version of `3.0.1` since [Sep 07, 2020](https://mvnrepository.com/artifact/org.apache.spark/spark-catalyst)?) — Dmytro Mitin, Jan 02 '23 at 15:58
@user20881825 What is the package of this class? I can't see `org.apache.spark.sql.BaseExpressionEncoder` or `org.apache.spark.sql.catalyst.BaseExpressionEncoder` or `org.apache.spark.sql.catalyst.encoders.BaseExpressionEncoder`. — Dmytro Mitin, Jan 02 '23 at 15:59
@user20881825 Could you provide the output of `javap -cp ----workspace_spark_......_deploy.jar org.apache.spark.sql.Dataset`? — Dmytro Mitin, Jan 02 '23 at 16:01
@DmytroMitin, first of the thanks alot for your quick responses. The package for this class is "org.apache.spark.sql.catalyst.encoders" and the the jar is "spark-catalyst_2.12-3.2.0-SNAPSHOT.jar". I have noticed that this class does not exists in any version of spark-catalyst jar which exits on Maven. However, the jars which are available on databricks cluster cotains this class. — user20881825, Jan 02 '23 at 16:44
@user20881825 Oh, so databricks seem to use patched/snapshot version of Spark. This can explain your `NoSuchMethodError`. It's possible that one of your dependencies expect standard `spark-catalyst` with `exprEnc` returning `ExpressionEncoder` while in your classpath you have the snapshot version of `spark-catalyst` with `exprEnc` returning `BaseExpressionEncoder`. Do you have some way to manipulate dependencies in databricks? Can you replace snapshot `spark-catalyst` with the standard one? — Dmytro Mitin, Jan 02 '23 at 17:26
@user20881825 Hi. I registered account at Databricks and used databricks-connect so I can see this patched Databricks `spark-catalyst` jar. I understand that maybe you can't replace it with the standard one. Could you improve your code sample in the question so that I could try to use it for reproduction of `NoSuchMethodError`? Now the code is not self-contained so I can't use it. What dependencies are you using? How do you add them to databricks? Could you run something like https://gist.github.com/DmytroMitin/4a816c11026e51ee5d4cc23135aa75db so that we can see what jars you have in classpath — Dmytro Mitin, Jan 03 '23 at 17:30
@user20881825 For example my classpath is https://gist.github.com/DmytroMitin/de830a231f397d9da0f41bb3557efbfb — Dmytro Mitin, Jan 03 '23 at 17:32
@DmytroMitin, sorry for delay in response. I have written a standalone program which is reproducing this issue(updated the code in description) and I hope this will help you in the understanding the case. You just need to replace the "filepath" variable with actual filepath that you will be reading. I had created the jar containing this class and created the dataricks spark job using that jar. When I ran the job I got the same issue. — user20881825, Jan 18 '23 at 06:18
Thanks everyone for your help. We have found the root cause of the issue and solution for the same. Pease find below the details: Resolution: We had to replace the method “dataSet.exprEnc()“ with “dataSet.encoder()“ in the code and updated the spark-sql jar version to “spark-sql_2.12:3.2.0“ as well for the compilation. — user20881825, Jan 19 '23 at 06:34

Databricks Azure: Getting issue NoSuchMethodError: org.apache.spark.sql.Dataset.exprEnc()Lorg/apache/spark/sql/catalyst/encoders/ExpressionEncoder;

0 Answers0