0

Can someone please help with the scala equivalent for the below Python code. This code is to recursively list all the files in nested folder present in Azure storage in Databricks.

def deep_ls(path: str):
    
    for x in dbutils.fs.ls(path):
        if x.path[-1] is not '/':
            yield x
        else:
            for y in deep_ls(x.path):
                yield y
                 

from pprint import pprint
files = list(deep_ls("srcpath/2021/06/16/"))
for x in files:
  df = x.name
  pprint(df)

Thank you.

The code I have tried:

def deep_ls(path: String) = {
   
    for (x <-  dbutils.fs.ls(path)){
        if (x.path(-1) != '/') {
            return x
        }
          else{
            for (y <- deep_ls(x.path)){
                return y
            }
          }
    }
}

The error message.

command-3888229438512929:5: error: method deep_ls has return statement; needs result type
                return x
                ^
    command-3888229438512929:8: error: recursive method deep_ls needs result type
                for (y <- deep_ls(x.path)){
                          ^

After giving the return type for the function, I am getting the below error.

command-3888229438512929:6: error: type mismatch;
 found   : com.databricks.backend.daemon.dbutils.FileInfo
 required: String
            return x
                   ^
command-3888229438512929:10: error: type mismatch;
 found   : Char
 required: String
                return y
                       ^
command-3888229438512929:4: error: type mismatch;
 found   : Unit
 required: String
    for (x <-  dbutils.fs.ls(path)){
Alex Ott
  • 80,552
  • 8
  • 87
  • 132
SanjanaSanju
  • 261
  • 2
  • 18

1 Answers1

0

Seems that you don't really need the lazyness of the Python code (generators stuff). Thus the equivalent Scala code would be something like this:

def deep_ls(path: String): Seq[String] = {   
  dbutils.fs.ls(path).flatMap { x => 
    if (x.path(-1) != '/') {
      Seq(x)
    } else {
      deep_ls(x.path))
    }
  }
}
Gaël J
  • 11,274
  • 4
  • 17
  • 32
  • J Thank you for the reply, Im using this code in databricks and getting the below error. error: type mismatch; found : Seq[Object] required: Seq[String] --- dbutils.fs.ls(path).flatMap { x => – SanjanaSanju Sep 23 '21 at 20:46
  • J . please let me know if this works for you. – SanjanaSanju Sep 23 '21 at 20:50
  • I assumed the return type of `dbutils.fs.ls` is a list of `String` in my answer but it's probably note. You can adjust accordingly. – Gaël J Sep 24 '21 at 10:59
  • its wrapped array. [FileInfo(path='dbfs:srcpath/2021/06/16/0/CTMEZZ_670006375380_MOV_1_8_HD.xml', name='CTMEZZ_670006375380_MOV_1_8_HD.xml', size=6023), – SanjanaSanju Sep 24 '21 at 12:47