0

I have a directory structure of the form:

base_directory / level_one_a, level_one_b, level_one_c /

then within all those directories in level_one_x are a multitude of subsequent directories, i.e.

/ level_one_a_1,level_one_a_2,level_one_a_3...

and so on for level_one_b & level_one_c

then inside of level_one_a_1 we have more still, i.e. level_one_a_1_I,level_one_a_1_II,level_one_a_1_III,level_one_a_1_IV...

Then finally inside of level_one_a_1_IV, and all those on the same level, are the files I want to operate on.

I guess a shorter way to say that would be start/one/two/three/*files*

There are many many files and I want to process them all with a simple java program I wrote:

    try 
    {
        StringBuilder sb = new StringBuilder();
        String line = br.readLine();

        while (line != null) 
        {

            sb.append(line);
            sb.append(System.lineSeparator());
            line = br.readLine();
        }
        String everything = sb.toString();



        Document doc = Jsoup.parse(everything);
        String link = doc.select("block.full_text").text();
        System.out.println(link);


    }
    finally 
    {
        br.close();
    }

it uses jsoup

I'd like to construct this script such that the program can navigate this directory structure autonomously and grab each file then process it with that script, using buffered reader and file reader I guess, how can I facilitate that? I tried implementing this solution but I couldn't get it to work.

Ideally I want to output each file it processes with a unique name, i.e. is the file is named 00001.txt it might save it as 00001_output.txt but, that's a horse of a different colour

Community
  • 1
  • 1
smatthewenglish
  • 2,831
  • 4
  • 36
  • 72

2 Answers2

2

Just use java.io.File and its method listFiles. See javadoc File API

Similar question on SO was posted here: Recursively list files in Java

Community
  • 1
  • 1
Kousalik
  • 3,111
  • 3
  • 24
  • 46
0

You can achieve this also by using the Java NIO 2 API.

public class ProcessFiles extends SimpleFileVisitor<Path> {

    static final String OUT_FORMAT = "%-17s: %s%n";
    static final int MAX_DEPTH = 4;
    static final Path baseDirectory = Paths.get("R:/base_directory");

    public static void main(String[] args) throws IOException {
        Set<FileVisitOption> visitOptions = new HashSet<>();
        visitOptions.add(FileVisitOption.FOLLOW_LINKS);
        Files.walkFileTree(baseDirectory, visitOptions, MAX_DEPTH,
                new ProcessFiles()
        );
    }

    @Override
    public FileVisitResult visitFile(Path file, BasicFileAttributes attr) {
        if (file.getNameCount() <= MAX_DEPTH) {
            System.out.printf(OUT_FORMAT, "skip wrong level", file);
            return FileVisitResult.SKIP_SUBTREE;
        } else {
            // add probably a file name check
            System.out.printf(OUT_FORMAT, "process file", file);
            return CONTINUE;
        }
    }

    @Override
    public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attr) {
        if (dir.getNameCount() < MAX_DEPTH) {
            System.out.printf(OUT_FORMAT, "walk into dir", dir);
            return CONTINUE;
        }
        if (dir.getName(MAX_DEPTH - 1).toString().equals("level_one_a_1_IV")) {
            System.out.printf(OUT_FORMAT, "destination dir", dir);
            return CONTINUE;
        } else {
            System.out.printf(OUT_FORMAT, "skip dir name", dir);
            return FileVisitResult.SKIP_SUBTREE;
        }
    }
}

assuming following directory/file structure

base_directory
base_directory/base_directory.file
base_directory/level_one_a
base_directory/level_one_a/level_one_a.file
base_directory/level_one_a/level_one_a_1
base_directory/level_one_a/level_one_a_1/level_one_a_1.file
base_directory/level_one_a/level_one_a_1/level_one_a_1_I
base_directory/level_one_a/level_one_a_1/level_one_a_1_I/level_one_a_1_I.file
base_directory/level_one_a/level_one_a_1/level_one_a_1_II
base_directory/level_one_a/level_one_a_1/level_one_a_1_II/level_one_a_1_II.file
base_directory/level_one_a/level_one_a_1/level_one_a_1_III
base_directory/level_one_a/level_one_a_1/level_one_a_1_III/level_one_a_1_III.file
base_directory/level_one_a/level_one_a_1/level_one_a_1_IV
base_directory/level_one_a/level_one_a_1/level_one_a_1_IV/level_one_a_1_IV.file
base_directory/someother_a
base_directory/someother_a/someother_a.file
base_directory/someother_a/someother_a_1
base_directory/someother_a/someother_a_1/someother_a_1.file
base_directory/someother_a/someother_a_1/someother_a_1_I
base_directory/someother_a/someother_a_1/someother_a_1_I/someother_a_1_I.file
base_directory/someother_a/someother_a_1/someother_a_1_II
base_directory/someother_a/someother_a_1/someother_a_1_II/someother_a_1_II.file
base_directory/someother_a/someother_a_1/someother_a_1_III
base_directory/someother_a/someother_a_1/someother_a_1_III/someother_a_1_III.file
base_directory/someother_a/someother_a_1/someother_a_1_IV
base_directory/someother_a/someother_a_1/someother_a_1_IV/someother_a_1_IV.file

you would get following output (for demonstration)

walk into dir    : R:\base_directory
skip wrong level : R:\base_directory\base_directory.file
walk into dir    : R:\base_directory\level_one_a
skip wrong level : R:\base_directory\level_one_a\level_one_a.file
walk into dir    : R:\base_directory\level_one_a\level_one_a_1
skip wrong level : R:\base_directory\level_one_a\level_one_a_1\level_one_a_1.file
skip dir name    : R:\base_directory\level_one_a\level_one_a_1\level_one_a_1_I
skip dir name    : R:\base_directory\level_one_a\level_one_a_1\level_one_a_1_II
skip dir name    : R:\base_directory\level_one_a\level_one_a_1\level_one_a_1_III
destination dir  : R:\base_directory\level_one_a\level_one_a_1\level_one_a_1_IV
process file     : R:\base_directory\level_one_a\level_one_a_1\level_one_a_1_IV\level_one_a_1_IV.file
walk into dir    : R:\base_directory\someother_a
skip wrong level : R:\base_directory\someother_a\someother_a.file
walk into dir    : R:\base_directory\someother_a\someother_a_1
skip wrong level : R:\base_directory\someother_a\someother_a_1\someother_a_1.file
skip dir name    : R:\base_directory\someother_a\someother_a_1\someother_a_1_I
skip dir name    : R:\base_directory\someother_a\someother_a_1\someother_a_1_II
skip dir name    : R:\base_directory\someother_a\someother_a_1\someother_a_1_III
skip dir name    : R:\base_directory\someother_a\someother_a_1\someother_a_1_IV

some links to the Oralce tutorial for further reading
Walking the File Tree
Finding Files

SubOptimal
  • 22,518
  • 3
  • 53
  • 69