I'm reading a csv file and turning it into parket:
read:
variable = spark.read.csv( r'C:\Users\xxxxx.xxxx\Desktop\archive\test.csv', sep=';', inferSchema=True, header=True)
sending for parquet:
variable .write.parquet( path= r'C:\Users\\xxxxx.xxxx\Desktop\archive\parquet\new.parquet'
#OR- r'C:\Users\xxxxx.xxxx\Desktop\archive\parquet\'
mode='overwrite',
Both give the same error:
Py4JJavaError: An error occurred while calling o186.parquet.
: org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.errors.QueryExecutionErrors$.jobAbortedError(QueryExecutionErrors.scala:651)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:288)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:186)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:113)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:111)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:125)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98
Job aborted due to stage failure:
Task 1 in stage 14.0 failed 1 times, most recent failure: Lost task 1.0 in stage 14.0 (TID 63)
(XXXXX-xxxx.xxx.local executor driver): java.io.IOException: (null) entry in command string: null chmod 0644 C:\Users\xxxx.xxxx\Desktop\xxx.parquet_temporary\0_temporary\attempt_202304111306381850890757855117295_0014_m_000001_63\part-00001-1ea07aa8-0302-492c-993c-86ce32f575d8-c000.snappy.parquet
in googlecolab it works perfectly, and I don't change anything in the code.
I just want to know why my windows 10 machine doesn't work, and what can i do to fix it