10

How to obtain an Array[io.BufferedSource] to all files that match a wildcard in a given directory ?

Namely, how to define a method io.Source.fromDir such that

val txtFiles: Array[io.BufferedSource] = io.Source.fromDir("myDir/*.txt") // ???

Noticed FileUtils in Apache Commons IO, yet much preferred is a Scala API based approach without external dependencies.

elm
  • 20,117
  • 14
  • 67
  • 113
  • Close to a duplicate of this: http://stackoverflow.com/questions/2637643/how-do-i-list-all-files-in-a-subdirectory-in-scala (you'd need to create the BufferedSource for each file, but that's a trivial extension) – The Archetypal Paul Dec 18 '14 at 13:47
  • Perhaps a difference here are the wildcards interpretation, could not find a Scala-based answer on that respect. – elm Dec 18 '14 at 15:56
  • Probably isn't too difficult to translate a glob to a regexp (`*` -> `[^/]*`, `?` -> `.`, `.` -> `\.`)? – The Archetypal Paul Dec 18 '14 at 16:42
  • And of course, that question's already been answered: http://stackoverflow.com/questions/1247772/is-there-an-equivalent-of-java-util-regex-for-glob-type-patterns – The Archetypal Paul Dec 18 '14 at 16:44

4 Answers4

11
scala> import reflect.io._, Path._
import reflect.io._
import Path._

scala> val r = """.*\.scala""".r
r: scala.util.matching.Regex = .*\.scala

scala> "/home/amarki/tmp".toDirectory.files map (_.name) flatMap { case n @ r() => Some(n) case _ => None }
res0: Iterator[String] = non-empty iterator

scala> .toList
res1: List[String] = List(bobsrandom.scala, ...)

or recursing

scala> import PartialFunction.{ cond => when }
import PartialFunction.{cond=>when}

scala> "/home/amarki/tmp" walkFilter (p => p.isDirectory || when(p.name) {
     | case r() => true })
res3: Iterator[scala.reflect.io.Path] = non-empty iterator
som-snytt
  • 39,429
  • 2
  • 47
  • 129
6

Here is an answer based on this great answer from @som-snytt:

scala> import reflect.io._, Path._
import reflect.io._
import Path._

scala> "/temp".toDirectory.files.map(_.path).filter(name => name matches """.*\.xlsx""")
res2: Iterator[String] = non-empty iterator

as an Array:

scala> "/temp".toDirectory.files.map(_.path).filter(name => name matches """.*\.xlsx""").toArray
res3: Array[String] = Array(/temp/1.xlsx, /temp/2.xlsx, /temp/3.xlsx, /temp/a.1.xlsx, /temp/Book1.xlsx, /temp/new.xlsx)
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
3

Using Java 8, it is possible to traverse a directory and all it's subdirectories. Then convert the iterator to scala, and then filter according to files ending with .txt:

import scala.collection.JavaConverters._ java.nio.file.Files.walk(Paths.get("mydir")).iterator().asScala.filter(file => file.toString.endsWith(".txt")).foreach(println)

Natan
  • 1,944
  • 1
  • 11
  • 16
  • Thanks for this, although I couldn't get `.filter(file => file.endsWith(".txt"))` to work. I had success with declaring a `PathMatcher matcher = FileSystems.getDefault().getPathMatcher("glob:**.txt")` and then `.filter(matcher.matches)`. See https://stackoverflow.com/a/25188854 – dmarwick Jul 07 '17 at 22:42
  • @dmarwick, you are correct. I forgot to add a `toString`. Corrected the answer. Thanks for your comment. – Natan Jul 09 '17 at 16:04
1

A bit rough on the edges, but maybe something like :

def getFilesMatchingRegex(dir: String, regex: util.matching.Regex) = {
    new java.io.File(dir).listFiles
        .filter(file => regex.findFirstIn(file.getName).isDefined)
        .map   (file => io.Source.fromFile(file))
}

Note that this won't fetch files in sub-directories, doesn't have more advance globbing features one might expect (à la ls ./**/*.scala), etc…

Marth
  • 23,920
  • 3
  • 60
  • 72