I have .txt and .csv files in my storage account. I want to delete only the .txt files how to do that in databricks using dbutils.fs.rm() ? or any other means?
Asked
Active
Viewed 2,078 times
2
-
2Does this answer your question? [How to move files of same extension in databricks files system?](https://stackoverflow.com/questions/50761539/how-to-move-files-of-same-extension-in-databricks-files-system) – mck Mar 19 '21 at 10:57
1 Answers
2
I tend to use the *unix equivalent if it is a one time thing -
%sh
rm -rf /dbfs/mnt/<your-path>/*delete_files*.txt
Add /dbfs/
to your existing /mnt
paths to access the underlying host filesystem
Else if you want to do it on a regular basis or part of your execution -
You can use the below function -
import re
def run_os_scandir(directory,pattern=None):
pattern = re.compile(pattern)
fu = []
for f in os.scandir(directory):
#### If the files you are looking for are standalone files , use (not is_dir) else remove not condition
if not f.is_dir() and pattern.match(os.path.basename(f.path)):
fu += [f.path]
return fu
#### Usage , Note , the function works only if you add /dbfs to your mount path(s)
delete_file_lst = run_os_scandir('/dbfs/mnt/<your-path>/','*delete_files*.txt')
Once you have the required files , you can remove them using standard os package or dbutils
dbutils - [ dbutils.fs.rm(f[5:]) for f in delete_file_lst ] ### f[5:] , removes the /dbfs , from the file path os - [os.remove(f) for f in delete_file_lst]

Vaebhav
- 4,672
- 1
- 13
- 33