dotnet-spark exception in writing to parquet file

Question

I'm just trying out dotnet spark. I modified the sample program to write the DataFrame contents into a parquet file. However I am getting an exception which does not seem to have a helpful info. May I know what may be causing the exception? Or is there somewhere the exception logs can be more helpful?

20/12/09 15:04:32 INFO SQLHadoopMapReduceCommitProtocol: Using output committer class org.apache.parquet.hadoop.ParquetOutputCommitter 20/12/09 15:04:32 INFO Executor: Executor killed task 1.0 in stage 6.0 (TID 604), reason: Stage cancelled

[2020-12-09T07:04:32.6029517Z] [HIS2547] [Exception] [JvmBridge] JVM method execution failed: Nonstatic method parquet failed for class 22 when called with 1 arguments ([Index=1, Type=String, Value=myparquet1], ) at Microsoft.Spark.Interop.Ipc.JvmBridge.CallJavaMethod(Boolean isStatic, Object classNameOrJvmObjectReference, String methodName, Object[] args) 20/12/09 15:04:32 WARN TaskSetManager: Lost task 1.0 in stage 6.0 (TID 604, localhost, executor driver): TaskKilled (Stage cancelled)

20/12/09 15:04:32 INFO TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks have all completed, from pool Exception saving to parquetSystem.Exception: JVM method execution failed: Nonstatic method parquet failed for class 22 when called with 1 arguments ([Index=1, Type=String, Value=myparquet1], ) at Microsoft.Spark.Interop.Ipc.JvmBridge.CallJavaMethod(Boolean isStatic, Object classNameOrJvmObjectReference, String methodName, Object[] args) at Microsoft.Spark.Interop.Ipc.JvmBridge.CallJavaMethod(Boolean isStatic, Object classNameOrJvmObjectReference, String methodName, Object arg0) at Microsoft.Spark.Interop.Ipc.JvmBridge.CallNonStaticJavaMethod(JvmObjectReference objectId, String methodName, Object arg0) at Microsoft.Spark.Interop.Ipc.JvmObjectReference.Invoke(String methodName, Object arg0) at Microsoft.Spark.Sql.DataFrameWriter.Parquet(String path) at MySparkApp.Program.Main(String[] args) in C:\Users\Administrator\mySparkApp\Program.cs:line 46

This is my code:

class Program
{
    static void Main(string[] args)
    {
        //BuildWebHost(args).Run();

        // Create a Spark session
        SparkSession spark = SparkSession
            .Builder()
            .AppName("word_count_sample1")
            .GetOrCreate();

        // Create initial DataFrame
        DataFrame dataFrame = spark.Read().Text(@"C:\Users\Administrator\mySparkApp\input.txt");

        // Count words
        DataFrame words = dataFrame
            .Select(Functions.Split(Functions.Col("value"), " ").Alias("words"))
            .Select(Functions.Explode(Functions.Col("words"))
            .Alias("word"))
            .GroupBy("word")
            .Count()
            .OrderBy(Functions.Col("count").Desc());

        // Show results
        words.Show();

        try
        {
            //words.Write().Mode(SaveMode.Append).Parquet("parquet.wordcount");
            var dataFrameWriter = words.Write();
            dataFrameWriter.Mode(SaveMode.Overwrite); // Append does not work either
            dataFrameWriter.Parquet("myparquet1");
        }
        catch (Exception ex)
        {
            Console.WriteLine("Exception saving to parquet" + ex.ToString());
        }
        spark.Stop();
   }

Basically the code simply creates the folder I specified in my parquet file's path parameter, but the contents are empty. If I try to read the parquet file previously created by my Scala driver using dotnet Spark, it reads just fine. Only the write from dotnet spark does not work. Any help would be appreciated. Thank you!

@Vijay_Shinde: I am running it locally using spark-submit. Actually I simply followed the tutorial from https://dotnet.microsoft.com/learn/data/spark-tutorial/intro Plus I added a bit of code to write the Dataframe to a parquet. — remondo, Dec 09 '20 at 07:39
ok. If you are having Spark Application Id, then try this command to get full logs. yarn logs -applicationId — Vijay_Shinde, Dec 09 '20 at 07:54
@Vijay_Shinde I am not using Yarn though...just running it locally with spark-submit — remondo, Dec 09 '20 at 08:07
@EdElliott I added the full error log at my github Gist: https://gist.github.com/raymond-ong/02bfc425772d6a33799332e2ecb38a95 You can search the Exceptions. The parquet related exception is at line 2036. Thank you! — remondo, Dec 10 '20 at 10:50
The error you are getting is when running a child process, 0x3FFFFECB (hex version of -1073741515) which means dll not found. Do you have spark with Hadoop and winutils or a separate Hadoop installation? Can you run winutils manually to see if you get the same error? — Ed Elliott, Dec 11 '20 at 07:47

dotnet-spark exception in writing to parquet file

0 Answers0