How to move files of same extension in databricks files system?

Question

I am facing file not found exception when i am trying to move the file with * in DBFS. Here both source and destination directories are in DBFS. I have the source file named "test_sample.csv" available in dbfs directory and i am using the command like below from notebook cell,

dbutils.fs.mv("dbfs:/usr/krishna/sample/test*.csv", "dbfs:/user/abc/Test/Test.csv")

Error:

java.io.FileNotFoundException: dbfs:/usr/krishna/sample/test*.csv

I appreciate any help. Thanks.

Hauke Mallow · Answer 1 · 2019-04-15T07:04:47.197

21

Wildcards are currently not supported with dbutils. You can move the whole directory:

dbutils.fs.mv("dbfs:/tmp/test", "dbfs:/tmp/test2", recurse=True)

or just a single file:

dbutils.fs.mv("dbfs:/tmp/test/test.csv", "dbfs:/tmp/test2/test2.csv")

As mentioned in the comments below, you can use python to implement this wildcard-logic. See also some code examples in my following answer.

edited Apr 15 '19 at 07:04

answered Jun 10 '18 at 12:02

Hauke Mallow

2,887
3
11
29

I must use wild card in my case since there are so many files which are not needed in the same directory. I need to move only csv file in that directory. Meanwhile i have found some workaround in my pyspark code. fileList = dbutils.fs.ls(dir) for files in fileList: if files.name.endswith("csv"): filename= files.path filename dbutils.fa.mv(filename,''). – Krishna Reddy Jun 10 '18 at 17:43
Use a for loop; listdir and move all of them at a time. Are wildcards still not allowed? – Itachi Apr 15 '19 at 05:43
Note for R users: when using 'recurse=True' the 'True' must be capitalised ie recurse=TRUE – Sanchez333 Jul 17 '23 at 13:47

score 5 · Answer 2 · answered May 06 '20 at 22:11

Since the wildcards are not allowed, we need to make it work in this way (list the files and then move or copy - slight traditional way)

import os

def db_list_files(file_path, file_prefix):
  file_list = [file.path for file in dbutils.fs.ls(file_path) if os.path.basename(file.path).startswith(file_prefix)]
  return file_list

files = db_list_files('dbfs:/your/src_dir', 'foobar')

for file in files:
  dbutils.fs.cp(file, os.path.join('dbfs:/your/tgt_dir', os.path.basename(file)))

score 1 · Answer 3 · answered Jan 24 '22 at 16:09

1

If you run your code in a Databricks cluster, you could access DBFS using the nodes file system. I'm not sure if in the background it requests all the objects and then filters, but at least you can use wildcards. E.g. from a databricks notebook

%sh
ls /dbfs/cluster-logs/*/driver/log4j-2021-09-01*

answered Jan 24 '22 at 16:09

ruloweb

704
8
10

1

This works but it takes almost 2 mins to fetch records for particular date. – karan arora Apr 17 '23 at 05:40

score 0 · Answer 4 · answered Dec 30 '19 at 04:31

0

dbutils.fs.mv("file:/<source>", "dbfs:/<destination>", recurse=True)

Use the above command to move a local folder to dbfs.

answered Dec 30 '19 at 04:31

chetan_surwade

82
7

How to move files of same extension in databricks files system?

4 Answers4

Linked