2

I have .txt and .csv files in my storage account. I want to delete only the .txt files how to do that in databricks using dbutils.fs.rm() ? or any other means?

Blue Clouds
  • 7,295
  • 4
  • 71
  • 112
  • 2
    Does this answer your question? [How to move files of same extension in databricks files system?](https://stackoverflow.com/questions/50761539/how-to-move-files-of-same-extension-in-databricks-files-system) – mck Mar 19 '21 at 10:57

1 Answers1

2

I tend to use the *unix equivalent if it is a one time thing -

%sh

rm -rf /dbfs/mnt/<your-path>/*delete_files*.txt

Add /dbfs/ to your existing /mnt paths to access the underlying host filesystem

Else if you want to do it on a regular basis or part of your execution -

You can use the below function -

import re

def run_os_scandir(directory,pattern=None):
    
    pattern = re.compile(pattern)
    
    fu = []
    
    for f in os.scandir(directory):
      #### If the files you are looking for are standalone files , use (not is_dir) else remove not condition
      if not f.is_dir() and pattern.match(os.path.basename(f.path)):
        fu += [f.path]

    return fu

#### Usage , Note , the function works only if you add /dbfs to your mount path(s)

delete_file_lst = run_os_scandir('/dbfs/mnt/<your-path>/','*delete_files*.txt')

Once you have the required files , you can remove them using standard os package or dbutils

dbutils - [
           dbutils.fs.rm(f[5:]) for f in delete_file_lst
         ] ### f[5:] , removes the /dbfs , from the file path 

os - [os.remove(f) for f in delete_file_lst]
Vaebhav
  • 4,672
  • 1
  • 13
  • 33