I have installed Apache Spark and PyTorch on my windows machine.
SparkSession - in-memory
Version
v3.1.2
Master
local[*]
AppName
pyspark-shell
Spark works fine until I try to write to a file e.g.
df1.write.csv('df1.csv')
It creates an empty folder, df1.csv. which is shared with other people.
The error is as follows:
Py4JJavaError Traceback (most recent call last)
<ipython-input-5-c2f611feb31d> in <module>
----> 1 df1.write.csv('df1.csv')
~\Anaconda3\lib\site-packages\pyspark\sql\readwriter.py in csv(self, path, mode, compression, sep, quote, escape, header, nullValue, escapeQuotes, quoteAll, dateFormat, timestampFormat, ignoreLeadingWhiteSpace, ignoreTrailingWhiteSpace, charToEscapeQuoteEscaping, encoding, emptyValue, lineSep)
1370 charToEscapeQuoteEscaping=charToEscapeQuoteEscaping,
1371 encoding=encoding, emptyValue=emptyValue, lineSep=lineSep)
-> 1372 self._jwrite.csv(path)
1373
1374 def orc(self, path, mode=None, partitionBy=None, compression=None):
~\Anaconda3\lib\site-packages\py4j\java_gateway.py in __call__(self, *args)
1303 answer = self.gateway_client.send_command(command)
1304 return_value = get_return_value(
-> 1305 answer, self.gateway_client, self.target_id, self.name)
1306
1307 for temp_arg in temp_args:
~\Anaconda3\lib\site-packages\pyspark\sql\utils.py in deco(*a, **kw)
109 def deco(*a, **kw):
110 try:
--> 111 return f(*a, **kw)
112 except py4j.protocol.Py4JJavaError as e:
113 converted = convert_exception(e.java_exception)
~\Anaconda3\lib\site-packages\py4j\protocol.py in get_return_value(answer, gateway_client, target_id, name)
326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
--> 328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(
Py4JJavaError: An error occurred while calling o29.csv.
: org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
...
When I try to write to a json file it says ' An error occurred while calling o32.json.'
Any ideas how to fix?
I have successfully resolved the previous error which was asking for a missing winutuls.exe by building it in Visual Studio for my winx64 system. so now this.