I've implemented a solution that uses Quartz to read a folder in a interval of time and for each file, it does some operations and deletes the file when it finish. It is smooth when i don't have thousand files in directory.
getFiles(config.getString("input")) match {
case Some(files) =>
files.foreach { file =>
try {
// check if file is in use
if (file.renameTo(file)) {
process(file, config)
}
} catch {
case e: Exception =>
} finally {
...
}
}
case None =>
...
}
def getFiles(path: String): Option[Array[File]] = {
new File(path).listFiles() match {
case files if files != null =>
Some(files.filter(file => file.lastModified < Calendar.getInstance.getTimeInMillis - 5000))
case _ =>
None
}
}
def process(file: File, clientConfig:Config) {
...
file.delete
}
Now my scenario is different - i'm working with thousand and thousand files - and my throughput is very slow: 50/sec (each file has 40kb).
I was wondering what is the best approach to process many files. Should I replace the method getFile() to return N elements and apply a FileLock on each element? If I use FileLock, I could to retrieve only the elements that are not in use. Or should i use something from Java NIO?
Thank in advance.