19

I am facing file not found exception when i am trying to move the file with * in DBFS. Here both source and destination directories are in DBFS. I have the source file named "test_sample.csv" available in dbfs directory and i am using the command like below from notebook cell,

dbutils.fs.mv("dbfs:/usr/krishna/sample/test*.csv", "dbfs:/user/abc/Test/Test.csv")

Error:

java.io.FileNotFoundException: dbfs:/usr/krishna/sample/test*.csv

I appreciate any help. Thanks.

Krishna Reddy
  • 1,069
  • 5
  • 12
  • 18

4 Answers4

21

Wildcards are currently not supported with dbutils. You can move the whole directory:

dbutils.fs.mv("dbfs:/tmp/test", "dbfs:/tmp/test2", recurse=True)

or just a single file:

dbutils.fs.mv("dbfs:/tmp/test/test.csv", "dbfs:/tmp/test2/test2.csv")

As mentioned in the comments below, you can use python to implement this wildcard-logic. See also some code examples in my following answer.

Hauke Mallow
  • 2,887
  • 3
  • 11
  • 29
  • I must use wild card in my case since there are so many files which are not needed in the same directory. I need to move only csv file in that directory. Meanwhile i have found some workaround in my pyspark code. fileList = dbutils.fs.ls(dir) for files in fileList: if files.name.endswith("csv"): filename= files.path filename dbutils.fa.mv(filename,''). – Krishna Reddy Jun 10 '18 at 17:43
  • Use a for loop; listdir and move all of them at a time. Are wildcards still not allowed? – Itachi Apr 15 '19 at 05:43
  • Note for R users: when using 'recurse=True' the 'True' must be capitalised ie recurse=TRUE – Sanchez333 Jul 17 '23 at 13:47
5

Since the wildcards are not allowed, we need to make it work in this way (list the files and then move or copy - slight traditional way)

import os

def db_list_files(file_path, file_prefix):
  file_list = [file.path for file in dbutils.fs.ls(file_path) if os.path.basename(file.path).startswith(file_prefix)]
  return file_list

files = db_list_files('dbfs:/your/src_dir', 'foobar')

for file in files:
  dbutils.fs.cp(file, os.path.join('dbfs:/your/tgt_dir', os.path.basename(file)))
1

If you run your code in a Databricks cluster, you could access DBFS using the nodes file system. I'm not sure if in the background it requests all the objects and then filters, but at least you can use wildcards. E.g. from a databricks notebook

%sh
ls /dbfs/cluster-logs/*/driver/log4j-2021-09-01*
ruloweb
  • 704
  • 8
  • 10
0
dbutils.fs.mv("file:/<source>", "dbfs:/<destination>", recurse=True)

Use the above command to move a local folder to dbfs.