101

Is there a good "scala-esque" (I guess I mean functional) way of recursively listing files in a directory? What about matching a particular pattern?

For example recursively all files matching "a*.foo" in c:\temp.

Nick Fortescue
  • 43,045
  • 26
  • 106
  • 134
  • 1
    [os-lib](https://github.com/lihaoyi/os-lib) provides an elegant Scala interface for filesystem operations like recursively listing files that match a pattern. See my answer for a simple one liner solution. Scala developers don't need to suffer with the low-level java.io and java.nio libraries that force you to write code that's unnecessarily verbose and complex. – Powers Dec 14 '20 at 15:10

23 Answers23

124

Scala code typically uses Java classes for dealing with I/O, including reading directories. So you have to do something like:

import java.io.File
def recursiveListFiles(f: File): Array[File] = {
  val these = f.listFiles
  these ++ these.filter(_.isDirectory).flatMap(recursiveListFiles)
}

You could collect all the files and then filter using a regex:

myBigFileArray.filter(f => """.*\.html$""".r.findFirstIn(f.getName).isDefined)

Or you could incorporate the regex into the recursive search:

import scala.util.matching.Regex
def recursiveListFiles(f: File, r: Regex): Array[File] = {
  val these = f.listFiles
  val good = these.filter(f => r.findFirstIn(f.getName).isDefined)
  good ++ these.filter(_.isDirectory).flatMap(recursiveListFiles(_,r))
}
Rex Kerr
  • 166,841
  • 26
  • 322
  • 407
  • 7
    WARNING: I ran this code and sometimes f.listFiles returns null (don't know why but on my mac it does) and the recursiveListFiles function crashes. I'm not experienced enough to build an elegant null check in scala, but returning an empty array if these ==null worked for me. – Jan Nov 28 '10 at 21:27
  • 2
    @Jan - `listFiles` returns `null` if `f` doesn't point to a directory or if there's an IO error (at least according to the Java spec). Adding a null check is probably wise for production use. – Rex Kerr Feb 22 '11 at 17:21
  • @Rex Perhaps better than the `null` check, would be to have sanity check that `f` is a directory at the start of the function. This would help with readability, in that the meaning of the check would be very clear. E.g: `if (!f.isDirectory) return Array()` – Peter Schwarz Jul 13 '11 at 17:53
  • 5
    @Peter Schwarz - You _still_ need the null check, since it is possible for `f.isDirectory` to return true but `f.listFiles` to return `null`. For example, if you don't have permission to read the files, you'll get a `null`. Rather than having both checks, I'd just add the one null check. – Rex Kerr Jul 13 '11 at 19:58
  • 1
    In fact you only need the null check, as `f.listFiles` returns null when `!f.isDirectory`. – Duncan McGregor Dec 01 '11 at 11:53
  • 2
    Regarding the Null check, the most idiomatic way would be to convert the null to option and use map. So the assignment is val these = Option(f.listFiles) and the ++ operator is inside a map operation with a 'getOrElse' at the end – Or Peles Nov 25 '12 at 09:47
48

I would prefer solution with Streams because you can iterate over infinite file system(Streams are lazy evaluated collections)

import scala.collection.JavaConversions._

def getFileTree(f: File): Stream[File] =
        f #:: (if (f.isDirectory) f.listFiles().toStream.flatMap(getFileTree) 
               else Stream.empty)

Example for searching

getFileTree(new File("c:\\main_dir")).filter(_.getName.endsWith(".scala")).foreach(println)
Andrey Tyukin
  • 43,673
  • 4
  • 57
  • 93
yura
  • 14,489
  • 21
  • 77
  • 126
  • 4
    Alternative syntax: `def getFileTree(f: File): Stream[File] = f #:: Option(f.listFiles()).toStream.flatten.flatMap(getFileTree)` – VasiliNovikov Feb 14 '14 at 16:00
  • 3
    I agree with your intent, but this your solution is pointless. listFiles() already returns a fully-evaluated array, which your then "lazily" evaluate on toStream. You need a stream form scratch, look for java.nio.file.DirectoryStream. – Daniel Langdon Oct 03 '14 at 16:03
  • 8
    @Daniel it's not absolutely strict, it recurses directories lazily. – Guillaume Massé Oct 18 '14 at 15:22
  • 3
    I shall try that right now on my infinite file system :-) – Brian Agnew Jun 05 '15 at 09:58
  • Beware: JavaConversions is now deprecated. Use JavaConverters and asScala decoration instread. – Suma Dec 19 '19 at 09:05
34

As of Java 1.7 you all should be using java.nio. It offers close-to-native performance (java.io is very slow) and has some useful helpers

But Java 1.8 introduces exactly what you are looking for:

import java.nio.file.{FileSystems, Files}
import scala.collection.JavaConverters._
val dir = FileSystems.getDefault.getPath("/some/path/here") 

Files.walk(dir).iterator().asScala.filter(Files.isRegularFile(_)).foreach(println)

You also asked for file matching. Try java.nio.file.Files.find and also java.nio.file.Files.newDirectoryStream

See documentation here: http://docs.oracle.com/javase/tutorial/essential/io/walk.html

Martin
  • 967
  • 1
  • 6
  • 17
monzonj
  • 3,659
  • 2
  • 32
  • 27
  • i get: Error:(38, 32) value asScala is not a member of java.util.Iterator[java.nio.file.Path] Files.walk(dir).iterator().asScala.filter(Files.isRegularFile(_)).foreach(println) – stuart Sep 29 '17 at 03:05
20
for (file <- new File("c:\\").listFiles) { processFile(file) }

http://langref.org/scala+java/files

thefourtheye
  • 233,700
  • 52
  • 457
  • 497
Phil
  • 46,436
  • 33
  • 110
  • 175
11

Scala is a multi-paradigm language. A good "scala-esque" way of iterating a directory would be to reuse an existing code!

I'd consider using commons-io a perfectly scala-esque way of iterating a directory. You can use some implicit conversions to make it easier. Like

import org.apache.commons.io.filefilter.IOFileFilter
implicit def newIOFileFilter (filter: File=>Boolean) = new IOFileFilter {
  def accept (file: File) = filter (file)
  def accept (dir: File, name: String) = filter (new java.io.File (dir, name))
}
ArtemGr
  • 11,684
  • 3
  • 52
  • 85
11

I like yura's stream solution, but it (and the others) recurses into hidden directories. We can also simplify by making use of the fact that listFiles returns null for a non-directory.

def tree(root: File, skipHidden: Boolean = false): Stream[File] = 
  if (!root.exists || (skipHidden && root.isHidden)) Stream.empty 
  else root #:: (
    root.listFiles match {
      case null => Stream.empty
      case files => files.toStream.flatMap(tree(_, skipHidden))
  })

Now we can list files

tree(new File(".")).filter(f => f.isFile && f.getName.endsWith(".html")).foreach(println)

or realise the whole stream for later processing

tree(new File("dir"), true).toArray
Duncan McGregor
  • 17,665
  • 12
  • 64
  • 118
9

No-one has mentioned yet https://github.com/pathikrit/better-files

val dir = "src"/"test"
val matches: Iterator[File] = dir.glob("**/*.{java,scala}")
// above code is equivalent to:
dir.listRecursively.filter(f => f.extension == 
                      Some(".java") || f.extension == Some(".scala")) 
Phil
  • 46,436
  • 33
  • 110
  • 175
6

Apache Commons Io's FileUtils fits on one line, and is quite readable:

import scala.collection.JavaConversions._ // important for 'foreach'
import org.apache.commons.io.FileUtils

FileUtils.listFiles(new File("c:\temp"), Array("foo"), true).foreach{ f =>

}
Renaud
  • 16,073
  • 6
  • 81
  • 79
  • I had to add type information: FileUtils.listFiles(new File("c:\temp"), Array("foo"), true).toArray(Array[File]()).foreach{ f => } – Jason Wheeler Nov 11 '13 at 20:59
  • It's not very useful on a case-sensitive file system as the supplied extensions must match case exactly. There doesn't appear to be a way to specify the ExtensionFileComparator. – Brent Faust Aug 21 '15 at 05:12
  • 1
    a workaround: provide Array("foo", "FOO", "png", "PNG" ) – Renaud Aug 21 '15 at 09:34
5

I personally like the elegancy and simplicity of @Rex Kerr's proposed solution. But here is what a tail recursive version might look like:

def listFiles(file: File): List[File] = {
  @tailrec
  def listFiles(files: List[File], result: List[File]): List[File] = files match {
    case Nil => result
    case head :: tail if head.isDirectory =>
      listFiles(Option(head.listFiles).map(_.toList ::: tail).getOrElse(tail), result)
    case head :: tail if head.isFile =>
      listFiles(tail, head :: result)
  }
  listFiles(List(file), Nil)
}
polbotinka
  • 488
  • 6
  • 7
3

And here's a mixture of the stream solution from @DuncanMcGregor with the filter from @Rick-777:

  def tree( root: File, descendCheck: File => Boolean = { _ => true } ): Stream[File] = {
    require(root != null)
    def directoryEntries(f: File) = for {
      direntries <- Option(f.list).toStream
      d <- direntries
    } yield new File(f, d)
    val shouldDescend = root.isDirectory && descendCheck(root)
    ( root.exists, shouldDescend ) match {
      case ( false, _) => Stream.Empty
      case ( true, true ) => root #:: ( directoryEntries(root) flatMap { tree( _, descendCheck ) } )
      case ( true, false) => Stream( root )
    }   
  }

  def treeIgnoringHiddenFilesAndDirectories( root: File ) = tree( root, { !_.isHidden } ) filter { !_.isHidden }

This gives you a Stream[File] instead of a (potentially huge and very slow) List[File] while letting you decide which sorts of directories to recurse into with the descendCheck() function.

James Moore
  • 8,636
  • 5
  • 71
  • 90
3

How about

   def allFiles(path:File):List[File]=
   {    
       val parts=path.listFiles.toList.partition(_.isDirectory)
       parts._2 ::: parts._1.flatMap(allFiles)         
   }
Dino Fancellu
  • 1,974
  • 24
  • 33
3

Scala has library 'scala.reflect.io' which considered experimental but does the work

import scala.reflect.io.Path
Path(path) walkFilter { p => 
  p.isDirectory || """a*.foo""".r.findFirstIn(p.name).isDefined
}
roterl
  • 1,883
  • 14
  • 24
3

Take a look at scala.tools.nsc.io

There are some very useful utilities there including deep listing functionality on the Directory class.

If I remember correctly this was highlighted (possibly contributed) by retronym and were seen as a stopgap before io gets a fresh and more complete implementation in the standard library.

Don Mackenzie
  • 7,953
  • 7
  • 31
  • 32
3

The simplest Scala-only solution (if you don't mind requiring the Scala compiler library):

val path = scala.reflect.io.Path(dir)
scala.tools.nsc.io.Path.onlyFiles(path.walk).foreach(println)

Otherwise, @Renaud's solution is short and sweet (if you don't mind pulling in Apache Commons FileUtils):

import scala.collection.JavaConversions._  // enables foreach
import org.apache.commons.io.FileUtils
FileUtils.listFiles(dir, null, true).foreach(println)

Where dir is a java.io.File:

new File("path/to/dir")
Brent Faust
  • 9,103
  • 6
  • 53
  • 57
2

os-lib is the easiest way to recursively list files in Scala.

os.walk(os.pwd/"countries").filter(os.isFile(_))

Here's how to recursively list all the files that match the "a*.foo" pattern specified in the question:

os.walk(os.pwd/"countries").filter(_.segments.toList.last matches "a.*\\.foo")

os-lib is way more elegant and powerful than other alternatives. It returns os objects that you can easily move, rename, whatever. You don't need to suffer with the clunky Java libraries anymore.

Here's a code snippet you can run if you'd like to experiment with this library on your local machine:

os.makeDir(os.pwd/"countries")
os.makeDir(os.pwd/"countries"/"colombia")
os.write(os.pwd/"countries"/"colombia"/"medellin.txt", "q mas pues")
os.write(os.pwd/"countries"/"colombia"/"a_something.foo", "soy un rolo")
os.makeDir(os.pwd/"countries"/"brasil")
os.write(os.pwd/"countries"/"brasil"/"a_whatever.foo", "carnaval")
os.write(os.pwd/"countries"/"brasil"/"a_city.txt", "carnaval")

println(os.walk(os.pwd/"countries").filter(os.isFile(_))) will return this:

ArraySeq(
  /.../countries/brasil/a_whatever.foo, 
  /.../countries/brasil/a_city.txt, 
  /.../countries/colombia/a_something.foo, 
  /.../countries/colombia/medellin.txt)

os.walk(os.pwd/"countries").filter(_.segments.toList.last matches "a.*\\.foo") will return this:

ArraySeq(
  /.../countries/brasil/a_whatever.foo, 
  /.../countries/colombia/a_something.foo)

See here for more details on how to use the os-lib.

Powers
  • 18,150
  • 10
  • 103
  • 108
1

Here's a similar solution to Rex Kerr's, but incorporating a file filter:

import java.io.File
def findFiles(fileFilter: (File) => Boolean = (f) => true)(f: File): List[File] = {
  val ss = f.list()
  val list = if (ss == null) {
    Nil
  } else {
    ss.toList.sorted
  }
  val visible = list.filter(_.charAt(0) != '.')
  val these = visible.map(new File(f, _))
  these.filter(fileFilter) ++ these.filter(_.isDirectory).flatMap(findFiles(fileFilter))
}

The method returns a List[File], which is slightly more convenient than Array[File]. It also ignores all directories that are hidden (ie. beginning with '.').

It's partially applied using a file filter of your choosing, for example:

val srcDir = new File( ... )
val htmlFiles = findFiles( _.getName endsWith ".html" )( srcDir )
Rick-777
  • 9,714
  • 5
  • 34
  • 50
1

It seems nobody mentions the scala-io library from scala-incubrator...

import scalax.file.Path

Path.fromString("c:\temp") ** "a*.foo"

Or with implicit

import scalax.file.ImplicitConversions.string2path

"c:\temp" ** "a*.foo"

Or if you want implicit explicitly...

import scalax.file.Path
import scalax.file.ImplicitConversions.string2path

val dir: Path = "c:\temp"
dir ** "a*.foo"

Documentation is available here: http://jesseeichar.github.io/scala-io-doc/0.4.3/index.html#!/file/glob_based_path_sets

draw
  • 4,696
  • 6
  • 31
  • 37
1

The deepFiles method of scala.reflect.io.Directory provides a pretty nice way of recursively getting all the files in a directory:

import scala.reflect.io.Directory
new Directory(f).deepFiles.filter(x => x.startsWith("a") && x.endsWith(".foo"))

deepFiles returns an iterator so you can convert it some other collection type if you don't need/want lazy evaluation.

0

This incantation works for me:

  def findFiles(dir: File, criterion: (File) => Boolean): Seq[File] = {
    if (dir.isFile) Seq()
    else {
      val (files, dirs) = dir.listFiles.partition(_.isFile)
      files.filter(criterion) ++ dirs.toSeq.map(findFiles(_, criterion)).foldLeft(Seq[File]())(_ ++ _)
    }
  }
Connor Doyle
  • 1,812
  • 14
  • 22
0

You can use tail recursion for it:

object DirectoryTraversal {
  import java.io._

  def main(args: Array[String]) {
    val dir = new File("C:/Windows")
    val files = scan(dir)

    val out = new PrintWriter(new File("out.txt"))

    files foreach { file =>
      out.println(file)
    }

    out.flush()
    out.close()
  }

  def scan(file: File): List[File] = {

    @scala.annotation.tailrec
    def sc(acc: List[File], files: List[File]): List[File] = {
      files match {
        case Nil => acc
        case x :: xs => {
          x.isDirectory match {
            case false => sc(x :: acc, xs)
            case true => sc(acc, xs ::: x.listFiles.toList)
          }
        }
      }
    }

    sc(List(), List(file))
  }
}
Milind
  • 1
  • 2
0

Minor improvement to the accepted answer.
By partitioning on the _.isDirectory this function returns list of files only.
(Directories are excluded)

import java.io.File
def recursiveListFiles(f: File): Array[File] = {
  val (dir, files)  = f.listFiles.partition(_.isDirectory)
  files ++ dir.flatMap(recursiveListFiles)
}
0

获取路径下所有文件,剔除文件夹

import java.io.File
import scala.collection.mutable.{ArrayBuffer, ListBuffer}

object pojo2pojo {

    def main(args: Array[String]): Unit = {
        val file = new File("D:\\tmp\\tmp")
        val files = recursiveListFiles(file)
        println(files.toList)
        // List(D:\tmp\tmp\1.txt, D:\tmp\tmp\a\2.txt)
    }

    def recursiveListFiles(f: File):ArrayBuffer[File] = {
        val all = collection.mutable.ArrayBuffer(f.listFiles:_*)
        val files = all.filter(_.isFile)
        val dirs = all.filter(_.isDirectory)
        files ++ dirs.flatMap(recursiveListFiles)
    }

}


-1

Why are you using Java's File instead of Scala's AbstractFile?

With Scala's AbstractFile, the iterator support allows writing a more concise version of James Moore's solution:

import scala.reflect.io.AbstractFile  
def tree(root: AbstractFile, descendCheck: AbstractFile => Boolean = {_=>true}): Stream[AbstractFile] =
  if (root == null || !root.exists) Stream.empty
  else
    (root.exists, root.isDirectory && descendCheck(root)) match {
      case (false, _) => Stream.empty
      case (true, true) => root #:: root.iterator.flatMap { tree(_, descendCheck) }.toStream
      case (true, false) => Stream(root)
    }