0

I've written a small function to calculate the size of all the files in a directory. The actual function does a lot more but this example has been used for brevity.

This works and walking the directory recursively is easy enough but I'd like to exclude all the filenames that have already been processed. I'd like to keep track of all the filenames in a List so that before getting the size of a file, I check if it exists in the List and if it does, it should be excluded. I don't want any MD5 checksums or anything. Filenames are good enough for my situation.

Since I can only return one value from a function and Java doesn't allow pass-by-reference, I'm pretty lost as to what is the best way to implement this. Here's my code:

public static Long getFileSize(File dirDirectory) {
    Long lngSize = new Long(0);

    for (File filItem : dirDirectory.listFiles()) {
        if (filItem.isDirectory()) {
            lngSize += getFileSize(filItem);
        }
        else {
            //Is a file with the same filename alrwady been calculated
            //then exclude it
            //else
            //include it.
            lngSize += filItem.length();
        }
    }

    return lngSize;
}
DNA
  • 42,007
  • 12
  • 107
  • 146
Mridang Agarwalla
  • 43,201
  • 71
  • 221
  • 382
  • What do you mean Java can't pass by reference? All objects are passed by reference in Java? – Ruan Mendes Nov 06 '12 at 20:00
  • @Juan: objects references are passed by value. Subtle, but important. – Dave Nov 06 '12 at 20:02
  • I'm always confused by this logic. – Mridang Agarwalla Nov 06 '12 at 20:06
  • http://stackoverflow.com/questions/40480/is-java-pass-by-reference – Dave Nov 06 '12 at 20:14
  • @Dave I think that would be clearer if it said, you can't pass a reference to a pointer, but I do see what it means, I just don't see how it's relevant to this question, that's why I asked; the OP could return an object with the size and the files, but a better idea is to pass the current list around as the answers suggest – Ruan Mendes Nov 06 '12 at 20:19
  • @Juan: I don't see how it's relevant either; I was just replying to your comment. With regards to pointers, Java doesn't have them, or so I keep hearing. This makes the `NullPointerException` an interesting beast :) – Dave Nov 06 '12 at 20:23

6 Answers6

3

Don't use a List, use a HashSet. A list will use O(n) lookups to see if the file is there, whereas a HashSet will use O(1).

By making the method public and the helper function private, you don't expose the HashSet implementation to the rest of your program (which doesn't and shouldn't care about it).

public static Long getFileSize(File dirDirectory) {
    return getFileSize(File dirDirectory, new HashSet<File>());
}

private static Long getFileSize(File dirDirectory, HashSet<File> prevProcess) {
    Long lngSize = new Long(0);

    for (File filItem : dirDirectory.listFiles()) {
        if (prevProcess.contains(filItem) continue;
        if (filItem.isDirectory()) {
            lngSize += getFileSize(filItem);
        }
        else {
            lngSize += filItem.length();
        }
        prevProcess.add(filItem);
    }

    return lngSize;
}
durron597
  • 31,968
  • 17
  • 99
  • 158
1

You can do it like this:

public static Long getFileSize(File dirDirectory) {
    return getFileSize(dirDirectory, new HashSet<String>());
}

public static Long getFileSize(File dirDirectory, Set<String> previouslyProcessedFiles) {
    //DO IT HERE AS YOU WISH
}
Hakan Serce
  • 11,198
  • 3
  • 29
  • 48
0

Pass a Set along:

public static Long getFileSize(Set<File> alreadySeen, File dirDirectory) {
    long lngSize = 0;

    for (File filItem : dirDirectory.listFiles()) {
        if (filItem.isDirectory()) {
            lngSize += getFileSize(filItem);
        }
        else {
            //Is a file with the same filename alrwady been calculated
            //then exclude it
            //else
            //include it.
            if (! alreadySeen.contains(filItem.getName())) {
                alreadySeen.add(filItem.getName());
                lngSize += filItem.length();
            }
        }
    }
    return lngSize;
}

to call:

Long size = getFileSize(new HashSet<File>(), myDirectory)

Also, you're better off using a long counter rather than Long to avoid Java needing to continually unbox/rebox your total.

By the way, it is simple to walk a directory tree without recursion, just add the directories you encounter on to a list to be processed later:

public static Long getFileSize(File dirDirectory) {
    long lngSize = 0;
    Deque<File> unprocessedDirs = new ArrayDeque<File>();
    unprocessedDirs.add(dirDirectory);
    Set<File> alreadySeen = new HashSet<File>();
    while (!unprocessedDirs.isEmpty()) {
        File dir = unprocessedDirs.removeFirst();

        for (File filItem : dir.listFiles()) {
            if (filItem.isDirectory()) {
                unprocessedDirs.addFirst(filItem); 
            }
            else {
                //Is a file with the same filename alrwady been calculated
                //then exclude it
                //else
                //include it.
                if (! alreadySeen.contains(filItem.getName())) {
                    alreadySeen.add(filItem.getName());
                    lngSize += filItem.length();
                }
            }
        }
    }
    return lngSize;
}
Adrian Pronk
  • 13,486
  • 7
  • 36
  • 60
0

How about this:

public static Long getFileSize(File dirDirectory, List<String> processed) {
    Long lngSize = new Long(0);

    for (File filItem : dirDirectory.listFiles()) {
        if (filItem.isDirectory()) {
            lngSize += getFileSize(filItem, processed);

        } else {
            String filName = filItem.getName();
            if (processed.contains(filName)) {
                continue;
            }
            lngSize += filItem.length();
            processed.add(filName);
        }
    }

    return lngSize;
}
Attila T
  • 577
  • 1
  • 4
  • 18
0

You can either using a global variable or pass the list as a parameter to the function. But my recommendation is not to use a List, but a Set, in particular a TreeSet or an HashSet.

You do not need to store duplicates, and you will have to search the full list for the file name - very expensive operation in a list O(n). A set will prevent duplicates but in particular the HashSet is O(n) and TreeSet is O(ln n) - making the search MUCH faster

See: Hashset vs Treeset

Community
  • 1
  • 1
thedayofcondor
  • 3,860
  • 1
  • 19
  • 28
0

I would suggest that You use the built-in filters FileFilter or FilenameFilter with the File.listFiles() method. This way it is more elegant and intuitive.

public class FileSizeCalculator {

    public static void main(String[] args) {
        System.out.println(getFileSize(new File(".")));
    }

    public static Long getFileSize(File directory) {

        FileFilter uniqueFilter = new FileFilter() {
            Set<File> uniqueFiles = new HashSet<File>();
            @Override
            public boolean accept(File file) {
                /**
                 * This will return true only if this set 
                 * did not already contain the specified element
                 */
                return uniqueFiles.add(file);
            }
        };

        long size = 0L;
        for (File file : directory.listFiles(uniqueFilter)) {
            size += file.isDirectory() ? getFileSize(file) : file.length();
        }
        return size;
    }
}
zafarkhaja
  • 2,562
  • 1
  • 20
  • 22