1

I am creating a file containing a list of sub-directories

task createNotes {
  doLast {
    def myFile = new File("my-notes.txt")
    def file = new File("src/test/")
    println myFile.exists()
    myFile.delete()
    println myFile.exists()
    println myFile.absolutePath
    println file.absolutePath
    myFile.withWriter {
      out ->
        file.eachDir { dir ->
          out.println dir.getName()
        }
    }
  }
}

Apparently the sort order can't be guaranteed, but every time I run it I get the same sort order:

soft
java
calc
conc
caab
pres

If I change the "soft" dir to "sofp" the output is:

java
sofp
calc
conc
caab
pres

When I change the name back, it goes to the original order.

It doesn't appear to be sorted in any particular order - which matches the docs saying the order can't be guaranteed, but if so, why is it always giving me the same sort every time?

jannis
  • 4,843
  • 1
  • 23
  • 53
opticyclic
  • 7,412
  • 12
  • 81
  • 155
  • 1
    See [my answer to a different question](https://stackoverflow.com/a/66839644/40342). The first couple of paragraphs apply here: don't confuse reproducible order (i.e. this specific implementation on this specific file system produces this specific order) with guaranteed order (i.e. it's guaranteed that all implementations will produce a specific order on all file systems). – Joachim Sauer Aug 13 '21 at 11:40

1 Answers1

5

Let's break it down and have a look at eachDir Groovy extension method implementation first:

public static void eachDir(File self, @ClosureParams(value = SimpleType.class, options = "java.io.File") Closure closure) throws FileNotFoundException, IllegalArgumentException {
    eachFile(self, FileType.DIRECTORIES, closure);
}

What does eachFile do?

public static void eachFile(final File self, final FileType fileType, @ClosureParams(value = SimpleType.class, options = "java.io.File") final Closure closure)
        throws FileNotFoundException, IllegalArgumentException {
    checkDir(self);
    final File[] files = self.listFiles();
    // null check because of http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4803836
    if (files == null) return;
    for (File file : files) {
        if (fileType == FileType.ANY ||
                (fileType != FileType.FILES && file.isDirectory()) ||
                (fileType != FileType.DIRECTORIES && file.isFile())) {
            closure.call(file);
        }
    }
}

OK, so Groovy just calls Java's File#listFiles method under the covers and then iterates the result without interfering with the existing order.

Moving to the OpenJDK implementation we can see that Files#listFiles uses FileSystem#list via the normalizedList method.

FileSystem#list is abstract. Continuing to the two most popular implementations it turns out that both UnixFileSystem#list and Win32FileSystem#list have a native implementation:

@Override
public native String[] list(File f);

Windows

Digging into the Windows implementation:

JNIEXPORT jobjectArray JNICALL
Java_java_io_WinNTFileSystem_list(JNIEnv *env, jobject this, jobject file)
{
    WCHAR *search_path;
    HANDLE handle;
    WIN32_FIND_DATAW find_data;
    int len, maxlen;
    jobjectArray rv, old;
    DWORD fattr;
    jstring name;
    jclass str_class;
    WCHAR *pathbuf;
    DWORD err;

    str_class = JNU_ClassString(env);
    CHECK_NULL_RETURN(str_class, NULL);

    pathbuf = fileToNTPath(env, file, ids.path);
    if (pathbuf == NULL)
        return NULL;
    search_path = (WCHAR*)malloc(2*wcslen(pathbuf) + 6);
    if (search_path == 0) {
        free (pathbuf);
        errno = ENOMEM;
        JNU_ThrowOutOfMemoryError(env, "native memory allocation failed");
        return NULL;
    }
    wcscpy(search_path, pathbuf);
    free(pathbuf);
    fattr = GetFileAttributesW(search_path);
    if (fattr == INVALID_FILE_ATTRIBUTES) {
        free(search_path);
        return NULL;
    } else if ((fattr & FILE_ATTRIBUTE_DIRECTORY) == 0) {
        free(search_path);
        return NULL;
    }

    /* Remove trailing space chars from directory name */
    len = (int)wcslen(search_path);
    while (search_path[len-1] == L' ') {
        len--;
    }
    search_path[len] = 0;

    /* Append "*", or possibly "\\*", to path */
    if ((search_path[0] == L'\\' && search_path[1] == L'\0') ||
        (search_path[1] == L':'
        && (search_path[2] == L'\0'
        || (search_path[2] == L'\\' && search_path[3] == L'\0')))) {
        /* No '\\' needed for cases like "\" or "Z:" or "Z:\" */
        wcscat(search_path, L"*");
    } else {
        wcscat(search_path, L"\\*");
    }

    /* Open handle to the first file */
    handle = FindFirstFileW(search_path, &find_data);
    free(search_path);
    if (handle == INVALID_HANDLE_VALUE) {
        if (GetLastError() != ERROR_FILE_NOT_FOUND) {
            // error
            return NULL;
        } else {
            // No files found - return an empty array
            rv = (*env)->NewObjectArray(env, 0, str_class, NULL);
            return rv;
        }
    }

    /* Allocate an initial String array */
    len = 0;
    maxlen = 16;
    rv = (*env)->NewObjectArray(env, maxlen, str_class, NULL);
    if (rv == NULL) { // Couldn't allocate an array
        FindClose(handle);
        return NULL;
    }
    /* Scan the directory */
    do {
        if (!wcscmp(find_data.cFileName, L".")
                                || !wcscmp(find_data.cFileName, L".."))
           continue;
        name = (*env)->NewString(env, find_data.cFileName,
                                 (jsize)wcslen(find_data.cFileName));
        if (name == NULL) {
            FindClose(handle);
            return NULL; // error
        }
        if (len == maxlen) {
            old = rv;
            rv = (*env)->NewObjectArray(env, maxlen <<= 1, str_class, NULL);
            if (rv == NULL || JNU_CopyObjectArray(env, rv, old, len) < 0) {
                FindClose(handle);
                return NULL; // error
            }
            (*env)->DeleteLocalRef(env, old);
        }
        (*env)->SetObjectArrayElement(env, rv, len++, name);
        (*env)->DeleteLocalRef(env, name);

    } while (FindNextFileW(handle, &find_data));

    err = GetLastError();
    FindClose(handle);
    if (err != ERROR_NO_MORE_FILES) {
        return NULL; // error
    }

    if (len < maxlen) {
        /* Copy the final results into an appropriately-sized array */
        old = rv;
        rv = (*env)->NewObjectArray(env, len, str_class, NULL);
        if (rv == NULL)
            return NULL; /* error */
        if (JNU_CopyObjectArray(env, rv, old, len) < 0)
            return NULL; /* error */
    }
    return rv;
}

We can see a combination of FindFirstFileW, FindNextFileW and FindClose WinAPI functions used to iterate over the files. Excerpt about ordering from the documentation of FindNextFileW:

The order in which the search returns the files, such as alphabetical order, is not guaranteed, and is dependent on the file system.

(...)

The order in which this function returns the file names is dependent on the file system type. With the NTFS file system and CDFS file systems, the names are usually returned in alphabetical order. With FAT file systems, the names are usually returned in the order the files were written to the disk, which may or may not be in alphabetical order. However, as stated previously, these behaviors are not guaranteed.

So the implementation lists the files in a way that tries to be most optimal given the OS and file-system type constraints. No guarantees about a particular order.

*nix

What about *nix systems? Here's the code:

JNIEXPORT jobjectArray JNICALL
Java_java_io_UnixFileSystem_list(JNIEnv *env, jobject this,
                                 jobject file)
{
    DIR *dir = NULL;
    struct dirent *ptr;
    int len, maxlen;
    jobjectArray rv, old;
    jclass str_class;

    str_class = JNU_ClassString(env);
    CHECK_NULL_RETURN(str_class, NULL);

    WITH_FIELD_PLATFORM_STRING(env, file, ids.path, path) {
        dir = opendir(path);
    } END_PLATFORM_STRING(env, path);
    if (dir == NULL) return NULL;

    /* Allocate an initial String array */
    len = 0;
    maxlen = 16;
    rv = (*env)->NewObjectArray(env, maxlen, str_class, NULL);
    if (rv == NULL) goto error;

    /* Scan the directory */
    while ((ptr = readdir(dir)) != NULL) {
        jstring name;
        if (!strcmp(ptr->d_name, ".") || !strcmp(ptr->d_name, ".."))
            continue;
        if (len == maxlen) {
            old = rv;
            rv = (*env)->NewObjectArray(env, maxlen <<= 1, str_class, NULL);
            if (rv == NULL) goto error;
            if (JNU_CopyObjectArray(env, rv, old, len) < 0) goto error;
            (*env)->DeleteLocalRef(env, old);
        }
#ifdef MACOSX
        name = newStringPlatform(env, ptr->d_name);
#else
        name = JNU_NewStringPlatform(env, ptr->d_name);
#endif
        if (name == NULL) goto error;
        (*env)->SetObjectArrayElement(env, rv, len++, name);
        (*env)->DeleteLocalRef(env, name);
    }
    closedir(dir);

    /* Copy the final results into an appropriately-sized array */
    if (len < maxlen) {
        old = rv;
        rv = (*env)->NewObjectArray(env, len, str_class, NULL);
        if (rv == NULL) {
            return NULL;
        }
        if (JNU_CopyObjectArray(env, rv, old, len) < 0) {
            return NULL;
        }
    }
    return rv;

 error:
    closedir(dir);
    return NULL;
}

This time iteration is supported by the opendir/readdir/closedir trio. The POSIX documentation of readdir only mentions this about ordering:

The type DIR, which is defined in the header <dirent.h>, represents a directory stream, which is an ordered sequence of all the directory entries in a particular directory.

Linux documentation has a bit more to say:

The order in which filenames are read by successive calls to readdir() depends on the filesystem implementation; it is unlikely that the names will be sorted in any fashion.

Close enough to Windows there is no order-guarantees apart from that there is some order.

Conclusion

'Is not guaranteed' means that a particular feature is an implementation detail and that you shouldn't rely on it. Contrary to 'guaranteed' features, which due to backward compatibility promise to stay invariant for some (usually longer) period of time. Changes to those features are called breaking changes and are usually well documented in release notes and migration guides (see Vue 3 migration guide for example). They also get announced long before they become finalized and released - see deprecation (often leaving some time to the developers to adjust their code to the new even after they go live).

The behaviour of 'unguaranteed' features on the other hand may differ between given product/library versions, environments (e.g. JVM implementations, operating systems, product versions) or even particular invocations. They sometimes show some predictable characteristics, but they should not be relied upon. You need to make sure to take care of the guarantees you expect for your piece of code to to have and implement them on your own.

Wrapping-up your issue: if you expect any specific order of files just sort them first - even if it means that the order will stay the same.

jannis
  • 4,843
  • 1
  • 23
  • 53