4

I have saved a parquet file on Spark using DataFrame.saveAsParquet() command.

How can I delete/remove this file via python code?

David
  • 11,245
  • 3
  • 41
  • 46
guptashail
  • 43
  • 1
  • 1
  • 3

2 Answers2

5

This parquet "file" will actually be a directory. This answer shows how to delete a directory with files in it

import shutil
shutil.rmtree('/folder_name')
Community
  • 1
  • 1
David
  • 11,245
  • 3
  • 41
  • 46
0

Since @bsplosion mentioned HDFS, here is how you could do it in a pySpark-script:

import subprocess

print("Deletion code:", subprocess.call(["hadoop", "fs", "-rm", "-r", "-skipTrash", "hdfs:/your/data/path"]))

# hadoop     - calls hadoop
# fs         - calls hadoops file system implementation
# -rm        - calls the remove command
# -r         - recursive removal in order to remove the entire directory
# -skipTrash - As it states: Skip the trash and directly remove everything

This returns Delection code: 0 if executed successfully, otherwise Delection code: -1. You can read more about hadoops -rm here in the docs.

Markus
  • 2,265
  • 5
  • 28
  • 54