44

I want to list all the FILES within the specified directory and subdirectories within that directory. No directories should be listed.

My current code is below. It does not work properly as it only lists the files and directories within the specified directory.

How can I fix this?

final List<Path> files = new ArrayList<>();

Path path = Paths.get("C:\\Users\\Danny\\Documents\\workspace\\Test\\bin\\SomeFiles");
try
{
  DirectoryStream<Path> stream;
  stream = Files.newDirectoryStream(path);
  for (Path entry : stream)
  {
    files.add(entry);
  }
  stream.close();
}
catch (IOException e)
{
  e.printStackTrace();
}

for (Path entry: files)
{
  System.out.println(entry.toString());
}
Ahmed Ashour
  • 5,179
  • 10
  • 35
  • 56
Danny Rancher
  • 1,923
  • 3
  • 24
  • 43
  • what do you mean "No Directories should be listed" – gaurav5430 Jan 08 '14 at 04:47
  • possible duplicate of [List all files from a directory recursively with Java](http://stackoverflow.com/questions/2534632/list-all-files-from-a-directory-recursively-with-java) – Brian Roach Jan 08 '14 at 04:47
  • 3
    @BrianRoach How is this a duplicate? I asked to solve the problem with nio.file.DirectoryStream. – Danny Rancher Jan 08 '14 at 04:48
  • "No directories should be listed" refers that if the specified directories contains subdirectories, only the files within the subdirectories should be listed, not the directories. – Danny Rancher Jan 08 '14 at 04:50
  • @BrianRoach this is not a duplicate, this question is specific to Java 7 api for DirectoryStream, not listFiles() that you linked to. – Fred Mar 06 '14 at 03:07
  • @fred You mean *exactly* the way two of the up-voted answers to that Q explain how to do? – Brian Roach Mar 06 '14 at 04:56
  • 1
    @BrianRoach The only "upvoted" answer is in direct reply to the question and also references DirectoryStream, still not a duplicate. – Fred Mar 06 '14 at 18:05
  • 3
    @BrianRoach I asked for a method using Java 7 nio. You think I'm duplicating a question asking for a method using Java 6 io. They are different. Please realise your mistake. Regards. – Danny Rancher Mar 06 '14 at 23:28

9 Answers9

77

Java 8 provides a nice way for that:

Files.walk(path)

This method returns Stream<Path>.

Vladimir Petrakovich
  • 4,184
  • 1
  • 30
  • 46
32

Make a method which will call itself if a next element is directory

void listFiles(Path path) throws IOException {
    try (DirectoryStream<Path> stream = Files.newDirectoryStream(path)) {
        for (Path entry : stream) {
            if (Files.isDirectory(entry)) {
                listFiles(entry);
            }
            files.add(entry);
        }
    }
}
Evgeniy Dorofeev
  • 133,369
  • 30
  • 199
  • 275
28

Check FileVisitor, very neat.

 Path path= Paths.get("C:\\Users\\Danny\\Documents\\workspace\\Test\\bin\\SomeFiles");
 final List<Path> files=new ArrayList<>();
 try {
    Files.walkFileTree(path, new SimpleFileVisitor<Path>(){
     @Override
     public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
          if(!attrs.isDirectory()){
               files.add(file);
          }
          return FileVisitResult.CONTINUE;
      }
     });
 } catch (IOException e) {
      e.printStackTrace();
 }
Adisesha
  • 5,200
  • 1
  • 32
  • 43
  • I don't think it is as neat as Evgniy's solution? – Danny Rancher Jan 08 '14 at 05:08
  • 3
    It's subjective. I like FileVisitor approach. – Adisesha Jan 08 '14 at 05:14
  • The issue with walkFileTree() is that it's an inner class, and since java doesn't support closures you can't access parent elements. If you're just manipulating files it works well, but if you're need to reference this from another scope it's not a good match. – Fred Mar 06 '14 at 03:04
  • 2
    @Fred Inner class is not right fit when your solution requires to modify primitive or immutable data types, which closures in language like JS, allow you to do.For example, get the longest file name.I believe that is what you said in the last sentence. My answer was in the context of this question. – Adisesha Mar 06 '14 at 08:49
  • I like this approach! – silentsudo Feb 04 '18 at 07:46
6

If you want to avoid having the function calling itself recursively and having a file list that is a member variable, you can use a stack:

private List<Path> listFiles(Path path) throws IOException {
    Deque<Path> stack = new ArrayDeque<Path>();
    final List<Path> files = new LinkedList<>();

    stack.push(path);

    while (!stack.isEmpty()) {
        DirectoryStream<Path> stream = Files.newDirectoryStream(stack.pop());
        for (Path entry : stream) {
            if (Files.isDirectory(entry)) {
                stack.push(entry);
            }
            else {
                files.add(entry);
            }
        }
        stream.close();
    }

    return files;
}
Duarte Meneses
  • 2,868
  • 19
  • 22
  • Recursion uses an implicit stack, what's the reason for using an external one? Are you gaining anything? – Abhijit Sarkar Jul 27 '18 at 17:19
  • There are no major differences, it's more of a coding style preference. Here are few reasons, though: If the recursion is too deep, you end up having a StackOverflowException; You could argue that using an explicit stack makes the code easier to read; In this case, you don't leave a stream open for every level of recursion; If you need to do any type of pre or post-processing (such as error handling) or keep a context during the recursion (such as the list of added files), you don't need to have another function calling this one; – Duarte Meneses Aug 16 '18 at 11:14
4

This is the shortest implementation I came up with:

final List<Path> files = new ArrayList<>();
Path path = Paths.get("C:\\Users\\Danny\\Documents\\workspace\\Test\\bin\\SomeFiles");
try {
    Files.walk(path).forEach(entry -> list.add(entry));
} catch (IOException e) {
    e.printStackTrack();
}
REjsmont
  • 63
  • 5
3

Using Rx Java, the requirement can be solved in a number of ways while sticking to usage of DirectoryStream from JDK.

Following combinations will give you the desired effect, I'd explain them in sequence:

Approach 1. A recursive approach using flatMap() and defer() operators

Approach 2. A recursive approach using flatMap() and fromCallable operators

Note: If you replace usage of flatMap() with concatMap(), the directory tree navigation will necessarily happen in a depth-first-search (DFS) manner. With flatMap(), DFS effect is not guaranteed.

Approach 1: Using flatMap() and defer()

   private Observable<Path> recursiveFileSystemNavigation_Using_Defer(Path dir) {
       return Observable.<Path>defer(() -> {
            //
            // try-resource block
            //
            try(DirectoryStream<Path> children = Files.newDirectoryStream(dir))
            {
                //This intermediate storage is required because DirectoryStream can't be navigated more than once.
                List<Path> subfolders = Observable.<Path>fromIterable(children)
                                                        .toList()
                                                        .blockingGet();


                return Observable.<Path>fromIterable(subfolders)
                        /* Line X */    .flatMap(p -> !isFolder(p) ? Observable.<Path> just(p) : recursiveFileSystemNavigation_Using_Defer(p), Runtime.getRuntime().availableProcessors());

                //      /* Line Y */  .concatMap(p -> !isFolder(p) ? Observable.<Path> just(p) : recursiveFileSystemNavigation_Using_Defer(p));

            } catch (IOException e) {
                /*
                 This catch block is required even though DirectoryStream is  Closeable
                 resource. Reason is that .close() call on a DirectoryStream throws a 
                 checked exception.
                */
                return Observable.<Path>empty();
            }
       });
    }

This approach is finding children of given directory and then emitting the children as Observables. If a child is a file, it will be immediately available to a subscriber else flatMap() on Line X will invoke the method recursively passing each sub-directory as argument. For each such subdir, flatmap will internally subscribe to their children all at the same time. This is like a chain-reaction which needs to be controlled.

Therefore use of Runtime.getRuntime().availableProcessors() sets the maximum concurrency level for flatmap() and prevents it from subscribing to all subfolders at the same time. Without setting concurrency level, imagine what will happen when a folder had 1000 children.

Use of defer() prevents the creation of a DirectoryStream prematurely and ensures it will happen only when a real subscription to find its subfolders is made.

Finally the method returns an Observable < Path > so that a client can subscribe and do something useful with the results as shown below:

//
// Using the defer() based approach
//
recursiveDirNavigation.recursiveFileSystemNavigation_Using_Defer(startingDir)
                    .subscribeOn(Schedulers.io())
                    .observeOn(Schedulers.from(Executors.newFixedThreadPool(1)))
                    .subscribe(p -> System.out.println(p.toUri()));

Disadvantage of using defer() is that it does not deal with checked exceptions nicely if its argument function is throwing a checked exception. Therefore even though DirectoryStream (which implements Closeable) was created in a try-resource block, we still had to catch the IOException because the auto closure of a DirectoryStream throws that checked exception.

While using Rx based style, use of catch() blocks for error handling sounds a bit odd because even errors are sent as events in reactive programming. So why not we use an operator which exposes such errors as events.

A better alternative named as fromCallable() was added in Rx Java 2.x. 2nd approach shows the use of it.

Approach 2. Using flatMap() and fromCallable operators

This approach uses fromCallable() operator which takes a Callable as argument. Since we want a recursive approach, the expected result from that callable is an Observable of children of given folder. Since we want a subscriber to receive results when they are available, we need to return a Observable from this method. Since the result of inner callable is an Observable list of children, the net effect is an Observable of Observables.

   private Observable<Observable<Path>> recursiveFileSystemNavigation_WithoutExplicitCatchBlock_UsingFromCallable(Path dir) {
       /*
        * fromCallable() takes a Callable argument. In this case the callbale's return value itself is 
        * a list of sub-paths therefore the overall return value of this method is Observable<Observable<Path>>
        * 
        * While subscribing the final results, we'd flatten this return value.
        * 
        * Benefit of using fromCallable() is that it elegantly catches the checked exceptions thrown 
        * during the callable's call and exposes that via onError() operator chain if you need. 
        * 
        * Defer() operator does not give that flexibility and you have to explicitly catch and handle appropriately.   
        */
       return Observable.<Observable<Path>> fromCallable(() -> traverse(dir))
                                        .onErrorReturnItem(Observable.<Path>empty());

    }

    private Observable<Path> traverse(Path dir) throws IOException {
        //
        // try-resource block
        //
        try(DirectoryStream<Path> children = Files.newDirectoryStream(dir))
        {
            //This intermediate storage is required because DirectoryStream can't be navigated more than once.
            List<Path> subfolders = Observable.<Path>fromIterable(children)
                                                    .toList()
                                                    .blockingGet();

            return Observable.<Path>fromIterable(subfolders)
                    /* Line X */    .flatMap(p -> ( !isFolder(p) ? Observable.<Path> just(p) : recursiveFileSystemNavigation_WithoutExplicitCatchBlock_UsingFromCallable(p).blockingSingle())
                                             ,Runtime.getRuntime().availableProcessors());

            //      /* Line Y */  .concatMap(p -> ( !isFolder(p) ? Observable.<Path> just(p) : recursiveFileSystemNavigation_WithoutExplicitCatchBlock_UsingFromCallable(p).blockingSingle() ));

        }
    }

A subscriber will then need to flatten the results stream as shown below:

//
// Using the fromCallable() based approach
//
recursiveDirNavigation.recursiveFileSystemNavigation_WithoutExplicitCatchBlock_UsingFromCallable(startingDir)
                        .subscribeOn(Schedulers.io())
                        .flatMap(p -> p)
                        .observeOn(Schedulers.from(Executors.newFixedThreadPool(1)))
                        .subscribe(filePath -> System.out.println(filePath.toUri()));

In traverse() method, why is line X using blocking Get

Because the recursive function returns an Observable < Observable >, but flatmap at that line needs an Observable to subscribe to.

Line Y in both approaches uses concatMap()

Because concatMap() can be comfortably used if we don't want parallelism during innner subscriptions made by flatmap().

In both approaches, the implementation of method isFolder looks like below:

private boolean isFolder(Path p){
    if(p.toFile().isFile()){
        return false;
    }

    return true;
}

Maven coordinates for Java RX 2.0

<dependency>
    <groupId>io.reactivex.rxjava2</groupId>
    <artifactId>rxjava</artifactId>
    <version>2.0.3</version>
</dependency>

Imports in Java file

import java.io.IOException;
import java.nio.file.DirectoryStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
import java.util.concurrent.Executors;
import io.reactivex.Observable;
import io.reactivex.schedulers.Schedulers;
NitinS
  • 322
  • 2
  • 12
2

Complete the implementation: It will read every file from subfolder just a quick check

Path configFilePath = FileSystems.getDefault().getPath("C:\\Users\\sharmaat\\Desktop\\issue\\stores");
List<Path> fileWithName = Files.walk(configFilePath)
                .filter(s -> s.toString().endsWith(".java"))
                .map(Path::getFileName)
                .sorted()
                .collect(Collectors.toList());

for (Path name : fileWithName) {
    // printing the name of file in every sub folder
    System.out.println(name);
}
Jon Bates
  • 3,055
  • 2
  • 30
  • 48
1

Try this ..it traverses through every folder and print both folder as well as files:-

public static void traverseDir(Path path) {
    try (DirectoryStream<Path> stream = Files.newDirectoryStream(path)) {
        for (Path entry : stream) {
            if (Files.isDirectory(entry)) {
                System.out.println("Sub-Folder Name : " + entry.toString());
                traverseDir(entry);
            } else {
                System.out.println("\tFile Name : " + entry.toString());
            }
        }
    } catch (IOException e) {
        e.printStackTrace();
    }
}
Theodore
  • 57
  • 1
  • 7
-1

Try : You will get a list of directory and sub-directory path; There may be unlimited sub-directory, try to use recursive process.

public class DriectoryFileFilter {
    private List<String> filePathList = new ArrayList<String>();

    public List<String> read(File file) {
        if (file.isFile()) {
            filePathList.add(file.getAbsolutePath());
        } else if (file.isDirectory()) {
            File[] listOfFiles = file.listFiles();
            if (listOfFiles != null) {
                for (int i = 0; i < listOfFiles.length; i++){
                    read(listOfFiles[i]);
                }
            } else {
                System.out.println("[ACCESS DENIED]");
            }
        }
        return filePathList;
    }
}
Zaw Than oo
  • 9,651
  • 13
  • 83
  • 131