0

I have installed Apache Spark and PyTorch on my windows machine.

SparkSession - in-memory

Version
v3.1.2
Master
local[*]
AppName
pyspark-shell

Spark works fine until I try to write to a file e.g.


df1.write.csv('df1.csv')

It creates an empty folder, df1.csv. which is shared with other people.

The error is as follows:

Py4JJavaError                             Traceback (most recent call last)
<ipython-input-5-c2f611feb31d> in <module>
----> 1 df1.write.csv('df1.csv')

~\Anaconda3\lib\site-packages\pyspark\sql\readwriter.py in csv(self, path, mode, compression, sep, quote, escape, header, nullValue, escapeQuotes, quoteAll, dateFormat, timestampFormat, ignoreLeadingWhiteSpace, ignoreTrailingWhiteSpace, charToEscapeQuoteEscaping, encoding, emptyValue, lineSep)
  1370                        charToEscapeQuoteEscaping=charToEscapeQuoteEscaping,
  1371                        encoding=encoding, emptyValue=emptyValue, lineSep=lineSep)
-> 1372         self._jwrite.csv(path)
  1373 
  1374     def orc(self, path, mode=None, partitionBy=None, compression=None):

~\Anaconda3\lib\site-packages\py4j\java_gateway.py in __call__(self, *args)
  1303         answer = self.gateway_client.send_command(command)
  1304         return_value = get_return_value(
-> 1305             answer, self.gateway_client, self.target_id, self.name)
  1306 
  1307         for temp_arg in temp_args:

~\Anaconda3\lib\site-packages\pyspark\sql\utils.py in deco(*a, **kw)
   109     def deco(*a, **kw):
   110         try:
--> 111             return f(*a, **kw)
   112         except py4j.protocol.Py4JJavaError as e:
   113             converted = convert_exception(e.java_exception)

~\Anaconda3\lib\site-packages\py4j\protocol.py in get_return_value(answer, gateway_client, target_id, name)
   326                 raise Py4JJavaError(
   327                     "An error occurred while calling {0}{1}{2}.\n".
--> 328                     format(target_id, ".", name), value)
   329             else:
   330                 raise Py4JError(

Py4JJavaError: An error occurred while calling o29.csv.
: org.apache.spark.SparkException: Job aborted.
   at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231)
   at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188)
   at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)
   at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)
   at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131)
   at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
...

When I try to write to a json file it says ' An error occurred while calling o32.json.'

Any ideas how to fix?

I have successfully resolved the previous error which was asking for a missing winutuls.exe by building it in Visual Studio for my winx64 system. so now this.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Bluetail
  • 1,093
  • 2
  • 13
  • 27
  • What happens when you give an absolute path to a folder you have permissions to? – OneCricketeer Aug 20 '21 at 22:16
  • Also, this will always create a **folder**. If you want a single file, see https://stackoverflow.com/questions/31674530/write-single-csv-file-using-spark-csv – OneCricketeer Aug 20 '21 at 22:17
  • with an absolute path 'C:/... /df1.csv' I get Py4JJavaError: An error occurred while calling o34.csv. : org.apache.spark.SparkException: Job aborted. so its the same error. I want a folder with multiples files in it. – Bluetail Aug 22 '21 at 20:07

0 Answers0