1

I need to read the absolute path, file name & size of the files in a directory. This is how I currently do it:

File diretory = <dir_path>;
File[] listFiles = directory.listFiles();
for (int i = 0; i < listFiles.length; i++) {
    String fileName = file.getName();
    String filePath = file.getAbsolutePath();
    long fileLen = file.length();
    long filelastModified = file.getLastModified();
    ...
}

My directory can have 1000s of files in it. Since I/O Operations being very expensive, is this the most optimal way to accomplish what I am doing?

Adi
  • 361
  • 1
  • 5
  • 23
  • 2
    Maybe [`Files.newDirectoryStream(...)`](https://docs.oracle.com/javase/10/docs/api/java/nio/file/Files.html#newDirectoryStream(java.nio.file.Path,java.lang.String)) helps. – Turing85 Apr 29 '18 at 15:04
  • @Turing85 Thanks for the suggestion. I will see if this fits my purpose. – Adi Apr 29 '18 at 15:10

5 Answers5

2

In your case :

File[] listFiles = directory.listFiles();

will create 1000 File objects but these are not expensive I/O operations as new File() doesn't perform IO operations while creating objects as FileInputStream do.
But note that you can all the same avoid creating all Files object in one time and reducing the consumed memory by streaming the walking files.
Files.newDirectoryStream(Path dir) that returns a DirectoryStream<Path> and Files.list(Path dir) that returns a Stream<Path> provide ways to achieve that.
Here's a post pointing out some differences between them.

So you could get the same result with the java.nio API in this way :

Path directory = ...;
Files.newDirectoryStream(directory)
     .forEach(p -> {
         try {
            String fileName = p.getFileName().toString();
            String filePath = p.toAbsolutePath().toString();
            long fileLen =  Files.size(p);
            long filelastModified = Files.getLastModifiedTime(p).toMillis();
        } catch (IOException e) {
            // FIXME to handle
        }

     });

Edit for comment :

What if there are sub-directories & there is a need to retrieve the details of files inside the sub-directories too?

In this case Files.walk() is more suitable as it is recursive.
It is very close to :

Path directory = ...;
Files.walk(directory)
     .forEach(p -> {
         try {
                // same code ....
         } catch (IOException e) {
             // FIXME to handle
         }

     });
davidxxx
  • 125,838
  • 23
  • 214
  • 215
  • This approach looks cleaner than mine. What if there are sub-directories & there is a need to retrieve the details of files inside the sub-directories too? – Adi Apr 29 '18 at 15:49
  • 1
    `java.nio` is designed to write a cleaner code. For your question you should use `Files.walk(directory)` instead. I updated. – davidxxx Apr 29 '18 at 16:38
2

I'd use File.list(), not listFiles(), it's a bit closer to the native api, less File objects to create upfront. But that's a small gain.

It's more interesting to pay attention to the fact that File.list() returns only the child name, so you save a few getters, and the path is the same for all children at a given parent, saving more trivial getters again.

You won't save on size and date, those have to be called once for each, sorry.

user2023577
  • 1,752
  • 1
  • 12
  • 23
2

With Java 7, java.nio.file.DirectoryStream<Path> offers an alternative with a huge gain in performance.

import java.io.IOException;
import java.nio.file.DirectoryStream;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;
...
    private static void nioDir( String filePath, int maxFiles )
       throws IOException {
      int i = 1;
      Path dir = FileSystems.getDefault().getPath( filePath );
      DirectoryStream<Path> stream = Files.newDirectoryStream( dir );
      for (Path path : stream) {
        System.out.println( "" + i + ": " + path.getFileName() );
        if (++i > maxFiles) break;
      }
      stream.close();
    }
RealHowTo
  • 34,977
  • 11
  • 70
  • 85
  • 1
    More important, it provides reliability. Instead of just returning null on failure, it provides a useful exception. – VGR Apr 29 '18 at 16:26
  • Can you quantify "huge"? – Stephen C Apr 29 '18 at 22:38
  • @stephenc, in my case the improvement was spectacular. Process only the 10 first files of directory with more than 60 000 files. The classical approach gets an array with all the files and then loop to process the first 10 files. With the java.nio approach, a stream is opened and the files are processed as needed, the stream is closed after the 10th file. With NIO, the benchmark returns ~160 ms, with the classical approach it was ~73500 ms. – RealHowTo Apr 30 '18 at 02:19
  • 2
    Ah ... that's a different use-case. The OP wants to process all of the files in the directory, not just the first 10. – Stephen C Apr 30 '18 at 06:40
1

AFAIK, this is close to as efficient as possible in Java. You might be able to squeeze maybe 2 to 5 percent, but that's typically not the kind of performance improvement that is worthwhile.

The problem is that a typical OS doesn't provide a way to retrieve the metadata for multiple files at a time, or retrieve multiple metadata values at a time.

I expect that the metadata operations (length(), getLastModified() etcetera) will use the vast majority of the time. But it is worth profiling your application to verify that.

Having said this, your application's I/O is probably not as slow as you think. It is likely that the OS will read and cache the disk blocks containing the metadata. The syscalls that read the file metadata will returning cached information most of the time. (Of course, this is OS specific, and dependent on the type of file system you are using.)

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • While it may be efficient, the latency is horrible because the method doesn't return until all the directory entries are scanned into the JVM's memory. Before the newer methods listed in other answers came about, I've actually had to implement `opendir()`/`readdir()`/`closedir()` in JNI just to work around the latency on huge directories. – Andrew Henle Apr 29 '18 at 17:07
  • The OP doesn't say that latency (time to getting the first entry) matters. If it does, then `DirectoryStream` is easier to use than JNI. – Stephen C Apr 30 '18 at 06:41
0

I had a similar issue loading a whole bunch of files from storage and after hours of tweaking came to the conclusion.

Use File.list() to get the file names and attach their directory paths manually.

and create File() elements when you need to