Error creating parquet file on my local machine

Question

I'm reading a csv file and turning it into parket:

read:

variable = spark.read.csv( r'C:\Users\xxxxx.xxxx\Desktop\archive\test.csv', sep=';', inferSchema=True, header=True)

sending for parquet:

variable .write.parquet( path= r'C:\Users\\xxxxx.xxxx\Desktop\archive\parquet\new.parquet'
#OR-  r'C:\Users\xxxxx.xxxx\Desktop\archive\parquet\'
mode='overwrite',

Both give the same error:

Py4JJavaError: An error occurred while calling o186.parquet.
: org.apache.spark.SparkException: Job aborted.
    at org.apache.spark.sql.errors.QueryExecutionErrors$.jobAbortedError(QueryExecutionErrors.scala:651)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:288)
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:186)
    at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:113)
    at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:111)
    at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:125)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98

Job aborted due to stage failure: 
Task 1 in stage 14.0 failed 1 times, most recent failure: Lost task 1.0 in stage 14.0 (TID 63) 
(XXXXX-xxxx.xxx.local executor driver): java.io.IOException: (null) entry in command string: null chmod 0644 C:\Users\xxxx.xxxx\Desktop\xxx.parquet_temporary\0_temporary\attempt_202304111306381850890757855117295_0014_m_000001_63\part-00001-1ea07aa8-0302-492c-993c-86ce32f575d8-c000.snappy.parquet

in googlecolab it works perfectly, and I don't change anything in the code.

I just want to know why my windows 10 machine doesn't work, and what can i do to fix it

One segment of your path has two slashes while others do not? Also, Spark will always write a directory, not a single file — OneCricketeer, Apr 11 '23 at 13:08
@OneCricketeer I accidentally put it in the explanation, consider a \ — Guilherme, Apr 11 '23 at 14:47
@AbdennacerLachiheb yes, os.environ["JAVA_HOME"] = r"C:\Program Files\Java\jre1.8.0_361" — Guilherme, Apr 11 '23 at 14:56
"Job aborted" is not your actual error. Please find the logs of the failed executor in Spark UI. Java/OS versions should not matter until you see exact error messages related to them (Besides, Colab uses Linux. On Windows, you need `winutils.exe`, `hadoop.dll` files, for example, which are not included with Spark) — OneCricketeer, Apr 11 '23 at 15:37
@OneCricketeer Yes, I downloaded winutils.exe from github, and I'm using hadoop 2.7, but it still doesn't run on windows. — Guilherme, Apr 11 '23 at 16:02
@OneCricketeer, error in log of spark: Job aborted due to stage failure: Task 1 in stage 14.0 failed 1 times, most recent failure: Lost task 1.0 in stage 14.0 (TID 63) (XXXXX-xxxx.xxx.local executor driver): java.io.IOException: (null) entry in command string: null chmod 0644 C:\Users\xxxx.xxxx\Desktop\xxx.parquet\_temporary\0\_temporary\attempt_202304111306381850890757855117295_0014_m_000001_63\part-00001-1ea07aa8-0302-492c-993c-86ce32f575d8-c000.snappy.parquet — Guilherme, Apr 11 '23 at 16:07
Looks like it tried to `chmod 0644` a file (using winutils), but then something was null in that command? But it did create a parquet file. The latest Pyspark uses Hadoop 3.x libraries, which will not work with Hadoop 2.7 resources, by the way. — OneCricketeer, Apr 11 '23 at 23:00
Does this answer your question? [(null) entry in command string exception in saveAsTextFile() on Pyspark](https://stackoverflow.com/questions/40764807/null-entry-in-command-string-exception-in-saveastextfile-on-pyspark) — samkart, Apr 12 '23 at 07:58
@OneCricketeer even changing the versions, as indicated, this error follows: Py4JJavaError: An error occurred while calling o45.parquet. : org.apache.spark.SparkException: Job aborted. """With Windows it's quite complicated""" — Guilherme, Apr 12 '23 at 12:28
@samkart I followed this process previously, and it didn't work. I don't know if there's a solution hahahaha — Guilherme, Apr 12 '23 at 12:30

Error creating parquet file on my local machine

0 Answers0