2

I am doing an os.walk() over a certain part of my OneDrive synced folder structure. It all worked fine until recently. Now ALL files from one specific directory are ignored. I tested several possible reasons and narrowed it down to this: The directory that is ignored is the one that holds the most files (897 at this point).

If I remove two of the files from said directory (it does not matter which two), it works and all files are recognized. When I add the files again, the result is the same: No files from that directory turn up in my os.walk() result list.

I did check Microsoft's Restrictions and limitations in OneDrive and SharePoint, but am far from any of the file size and number (1 ,2) limits mentioned.

My code looks like this

files = []
for root, dir, files in os.walk(mainDirectory):
    for f in files:
        if 'Common part' in root:
            files.append(os.path.join(root, f))

'Common part' is a text string, that all relevant folders in the mainDirectory have in common.

The directory itself is recognized all the times, just the files are not added to my list. So, I tried another approach featuring glob.glob(). Here, the results are a bit different but still not satisfactory:

folders = []
for root, dir, files in os.walk(mainDirectory):
    for d in dir:
        if d.startswith('Common part')
            folders.append(os.path.join(root, d))

files = [glob.glob(os.path.join(f,'*.xlsx')) for f in folders]

This does give me approximately half the files from the problematic folder. Again, when I remove two files, it gives me the full list.

When I copy/move the files to a local (not OneDrive synced) path, it works. So I guess it does have to do with OneDrive. Having the files outside of OneDrive is not an option.

The directory in question is not directly in my OneDrive but a "Sync"/"Shortcut" from SharePoint.

All files can be opened, they are downloaded, not on-demand. I have removed the sync and re-synced the folder. I have restarted OneDrive (and my machine) several times

I am really at a loss here. Any hints welcome!

Update: Thanks to the help of @GordonAitchJay, it could be established, that at the threshold of files (or sum of file sizes?) functions like os.listdir() and win32file.FindFilesW() stop returning their usual output and instead return OSError: [WinError 87] The parameter is incorrect

Also, in the meantime, we reproduced the same behaviour on another machine within the same organization. This was conducted after a full reset of my OneDrive did not result in any improvement.

windSpiel
  • 23
  • 5
  • Are you able to open these files ? Are you sure the sync is still working ? Are these files "on demand" ? Have you tried to unsync and sync again this directory ? I have worked a lot with onedrive and there are lots of weird bugs that got solved just with a onedrive.exe /reset and a full resync – Maxime Mar 15 '23 at 10:59
  • Thank you for the questions, should have mentioned that above: Yes, I have tried all of that. All files can be opened, they are downloaded, not on-demand. I have removed the sync and re-synced the folder. I have restarted OneDrive (and my machine) several times. – windSpiel Mar 15 '23 at 11:00
  • The directory in question os not directly in my OneDrive but a "Sync"/"Shortcut" from SharePoint – windSpiel Mar 15 '23 at 11:02
  • What does `os.listdir()` return when you pass in the path of the directory with 897 files? And then again when you remove 2 files? – GordonAitchJay Mar 15 '23 at 11:02
  • That seems to be a very good question: With 895 files, I get a list of file names (as expected) With 897 files, I get 'OSError: [WinError 87] The parameter is incorrect:' followed by the path. – windSpiel Mar 15 '23 at 11:07
  • Fascinating. You must be triggering OneDrive to do something with that directory when you remove those 2 files. Using Task Manager, terminate the OneDrive.exe process and stop the OneSyncSvc service, then try calling `os.listdir()` again, with and without the 2 extra files. – GordonAitchJay Mar 15 '23 at 11:17
  • I terminated OneDrive and stopped the OneSyncSvc services (had two instances of them). Result for os.listdir() is the same as before with WinError 87 – windSpiel Mar 15 '23 at 11:27
  • Wow, so it returned a list of 895 files, but then came back with WinError 87 when the directory has 897 files? I'm stumped! – GordonAitchJay Mar 15 '23 at 11:35
  • That's what happens. Just tried it again (gut the numbers wrong previously): Once there are 895 files, it returns the error, until then lists files. – windSpiel Mar 15 '23 at 12:47
  • `import win32file` then try `len(win32file.FindFilesW(r"C:\OneDrive\BigFolder\*"))`. You need to have `\*` at the end. You also might need to `pip install pywin32`. Note it also returns hidden files like `Thumbs.db` and also `.` and `..`. Under the hood, this uses win32's `FindFirstFileW`, `FindNextFileW`, `FindClose`, just like `os.listdir`. I except to see the same behaviour. – GordonAitchJay Mar 16 '23 at 09:52
  • What is your OneDrive directory? Is it a UNC path (like `\\OtherComputer\OneDrive`), or a UNC path mapped to a drive (like `Z:\OneDrive`)? – GordonAitchJay Mar 16 '23 at 09:53
  • Thank you @GordonAitchJay for your continuous assistance on this! FindFilesW does indeed show the same behaviour: With two files less it gives me 896 items, with all the files in it, it returns "error: (87, 'FindNextFileW', 'The parameter is incorrect.')". My OneDrive is on a path like "C:\Users\UserName\OneDriveFolder" – windSpiel Mar 17 '23 at 11:52
  • No worries. It's a very strange problem. I'll try to replicate it myself tomorrow if I have time. At least we know what function it's failing at. Google didn't return much, and the documentation for `FindNextFileW` doesn't explain why the parameter might be incorrect. Anyway, try this: `for file in win32file.FindFilesIterator(r"C:\OneDrive\BigFolder\*"): print(file)`. This yields one file at a time. It'll be interesting if it doesn't fail on the first file. – GordonAitchJay Mar 17 '23 at 13:10
  • Yeah, websearch does not seem to be very helpful here - tried that a lot in the past days. `FindFilesIterator()` actually returns 443 files, before the error occurs. That is very similar to `glob.glob()` behavior (441 files, probably due to the inclusion of _hidden files_, _._ and _.._ Funny enough is this: When I remove **1** file, the error occurs **after 25/23 files respectively already**. When I remove a second file, we're back at expected behavior/results (896/894 files). – windSpiel Mar 17 '23 at 13:27
  • Incredible! The fact that it only returns 25/23 files if you only remove 1 file blows my mind. What happens if you copy the entire directory, all 894 files, to somewhere not monitored/synced by OneDrive, for example: `C:\Users\UserName\OneDriveFolde1\BigFolder\`. Note the `1` replacing the `r` at the end of `OneDriveFolder` so as to keep the length of all the filepaths the same as before. Then try `for file in win32file.FindFilesIterator(r"C:\Users\UserName\OneDriveFolde1\BigFolder\*"): print(file)`. I was highly suspicious of OneDrive before, but maybe it isn't causing the problem. – GordonAitchJay Mar 18 '23 at 08:41
  • @GordonAitchJay: I just tried your suggestion. I copied the whole folder to a local directory, keeping the path length. It works when it's local only. To double check, I also copied the whole thing to my OneDrive root directory, in an effort to minimize the path length. This was interesting: Right after copying, it worked (when OneDrive had just started uploading the files). But a couple of minutes later (once a number of the files had been uploaded), Error 87 was back. Even though not all files are synced yet, it's again 443 that work. – windSpiel Mar 22 '23 at 08:06
  • That's interesting, but unsurprising. OneDrive is no doubt responsible. I don't have a SharePoint server, so I can't replicate the problem, and unfortunately the source code for the OneDrive Windows client/service is not available, so I can't see what it's getting up to. You'll need to submit a issue ticket/bug report via your organisation to Microsoft. – GordonAitchJay Mar 22 '23 at 11:06
  • Do the same tests as before, but with these commands run in cmd.exe (they all just list the directory's files): 1) `dir C:\Users\UserName\OneDriveFolder\BigFolder` 2) `powershell -command ls C:\Users\UserName\OneDriveFolder\BigFolder` 3) `powershell -command gci C:\Users\UserName\OneDriveFolder\BigFolder`. If I recall correctly, they call `FindNextFileEx` in addition to `FindNextFileW`. I suspect they will not show all the files. Interestingly, Explorer does not seem to call `FindNextFileW`. Nevertheless, it does eventually call `NtQueryDirectoryFile` which is what `FindNextFileW` calls. – GordonAitchJay Mar 22 '23 at 11:20
  • Run the code in [Eryk Sun's answer](https://stackoverflow.com/a/27448955/3589122) (copy the code under the headings "ctypes definitions", "DirEntry and ntlistdir", "Example" into a python file, but pass the right directory to `ntlistdir()`). It calls the lower level `NtQueryDirectoryFile` function, which is ultimately what `FindNextFileW` and Explorer call. I suspect you will get the correct list of files. – GordonAitchJay Mar 22 '23 at 11:33
  • I did all of the above. 1)-3) return lists of files. They all run through, no errors. 1) returned "896 File(s)", so it actually did find them all. 2) and 3) each returned lists of 897 entries, so also no shortage. Eryk Sun's code also returned the full list (as you suspected). So I can probably solve my issues with this, although I must admit that I fail to understand what's going wrong in the first place. – windSpiel Mar 22 '23 at 15:38
  • Dear @GordonAitchJay, the code you referenced is now working in my production environment. I would have liked to fix the root cause of the problem instead of working around it, but I'm very glad we found a way to make it work. You have. I really appreciate your help! Please do add a response here so I can highlight it as a solution and give you proper credit. – windSpiel Mar 23 '23 at 09:52
  • You're welcome windSpiel. Dang dude, I can't believe 1)-3) returned all the files. What if you navigate to the directory using another file manage like 7-Zip (7-Zip calls `FindNextFileW`)? If that works I will be shocked. Also, I expect this to fail, but try anyway: `os.listdir(r"\\?\C:\Users\UserName\OneDriveFolder\BigFolder")`. Yeah, it's totally wild that you have to resort to this. There are a few Microsoft employees who contribute to CPython. I will try to bring this to their attention. Hopefully they can get to the bottom of it. – GordonAitchJay Mar 23 '23 at 14:41
  • @windSpiel FYI I have submitted an issue on CPython's Github issue tracker. https://github.com/python/cpython/issues/102993 – GordonAitchJay Mar 24 '23 at 05:57
  • 7-Zip finds the first 441 files, then stops. So I guess it gets the same error, just does not bother it's users with it. the os.listdir() you suggested also fails with WinError 87. – windSpiel Mar 24 '23 at 11:02
  • Thank you for posting this. I just found myself in the same spot: OneDrive seems to be stopping `os.listdir` and `glob` from performing their actions appropriately in a recently-created folder with 670 files inside it. When I ran a `glob` file search in that folder, I got back the `OSError: [WinError 87] The parameter is incorrect` message. I ended up using the code below to work around it. – Felipe D. Aug 30 '23 at 20:25

1 Answers1

1

Though I can't prove it, it seems that OneDrive is up to some sort of tomfoolery that causes win32's FindNextFileW to fail with a ERROR_INVALID_PARAMETER error, but apparently only when it is called by Python's os.walk, os.listdir, and win32file.FindFilesW, and when some files have been deleted from the OneDrive directory syncing a SharePoint folder. Utterly bizarre. I'm thinking maybe OneDrive hooks FindNextFileW which remains after ending the OneDrive process and services with Task Manager.

A workaround is to use ctypes to call the lower level NtQueryDirectoryFile function (which is ultimately what FindNextFileW calls anyway).

Eryk Sun's answer to another question has a working example. I have copied it below, and have only changed the last couple lines:

import os
import msvcrt
import ctypes

from ctypes import wintypes

ntdll = ctypes.WinDLL('ntdll')
kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)

def NtError(status):
    err = ntdll.RtlNtStatusToDosError(status)
    return ctypes.WinError(err)

NTSTATUS = wintypes.LONG
STATUS_BUFFER_OVERFLOW = NTSTATUS(0x80000005).value
STATUS_NO_MORE_FILES = NTSTATUS(0x80000006).value
STATUS_INFO_LENGTH_MISMATCH = NTSTATUS(0xC0000004).value

ERROR_DIRECTORY = 0x010B
INVALID_HANDLE_VALUE = wintypes.HANDLE(-1).value
GENERIC_READ = 0x80000000
FILE_SHARE_READ = 1
OPEN_EXISTING = 3
FILE_FLAG_BACKUP_SEMANTICS = 0x02000000
FILE_ATTRIBUTE_DIRECTORY = 0x0010

FILE_INFORMATION_CLASS = wintypes.ULONG
FileDirectoryInformation = 1
FileBasicInformation = 4

LPSECURITY_ATTRIBUTES = wintypes.LPVOID
PIO_APC_ROUTINE = wintypes.LPVOID
ULONG_PTR = wintypes.WPARAM

class UNICODE_STRING(ctypes.Structure):
    _fields_ = (('Length',        wintypes.USHORT),
                ('MaximumLength', wintypes.USHORT),
                ('Buffer',        wintypes.LPWSTR))

PUNICODE_STRING = ctypes.POINTER(UNICODE_STRING)

class IO_STATUS_BLOCK(ctypes.Structure):
    class _STATUS(ctypes.Union):
        _fields_ = (('Status',  NTSTATUS),
                    ('Pointer', wintypes.LPVOID))
    _anonymous_ = '_Status',
    _fields_ = (('_Status',     _STATUS),
                ('Information', ULONG_PTR))

PIO_STATUS_BLOCK = ctypes.POINTER(IO_STATUS_BLOCK)

ntdll.NtQueryInformationFile.restype = NTSTATUS
ntdll.NtQueryInformationFile.argtypes = (
    wintypes.HANDLE,        # In  FileHandle
    PIO_STATUS_BLOCK,       # Out IoStatusBlock
    wintypes.LPVOID,        # Out FileInformation
    wintypes.ULONG,         # In  Length
    FILE_INFORMATION_CLASS) # In  FileInformationClass

ntdll.NtQueryDirectoryFile.restype = NTSTATUS
ntdll.NtQueryDirectoryFile.argtypes = (
    wintypes.HANDLE,        # In     FileHandle
    wintypes.HANDLE,        # In_opt Event
    PIO_APC_ROUTINE,        # In_opt ApcRoutine
    wintypes.LPVOID,        # In_opt ApcContext
    PIO_STATUS_BLOCK,       # Out    IoStatusBlock
    wintypes.LPVOID,        # Out    FileInformation
    wintypes.ULONG,         # In     Length
    FILE_INFORMATION_CLASS, # In     FileInformationClass
    wintypes.BOOLEAN,       # In     ReturnSingleEntry
    PUNICODE_STRING,        # In_opt FileName
    wintypes.BOOLEAN)       # In     RestartScan

kernel32.CreateFileW.restype = wintypes.HANDLE
kernel32.CreateFileW.argtypes = (
    wintypes.LPCWSTR,      # In     lpFileName
    wintypes.DWORD,        # In     dwDesiredAccess
    wintypes.DWORD,        # In     dwShareMode
    LPSECURITY_ATTRIBUTES, # In_opt lpSecurityAttributes
    wintypes.DWORD,        # In     dwCreationDisposition
    wintypes.DWORD,        # In     dwFlagsAndAttributes
    wintypes.HANDLE)       # In_opt hTemplateFile

class FILE_BASIC_INFORMATION(ctypes.Structure):
    _fields_ = (('CreationTime',   wintypes.LARGE_INTEGER),
                ('LastAccessTime', wintypes.LARGE_INTEGER),
                ('LastWriteTime',  wintypes.LARGE_INTEGER),
                ('ChangeTime',     wintypes.LARGE_INTEGER),
                ('FileAttributes', wintypes.ULONG))

class FILE_DIRECTORY_INFORMATION(ctypes.Structure):
    _fields_ = (('_Next',          wintypes.ULONG),
                ('FileIndex',      wintypes.ULONG),
                ('CreationTime',   wintypes.LARGE_INTEGER),
                ('LastAccessTime', wintypes.LARGE_INTEGER),
                ('LastWriteTime',  wintypes.LARGE_INTEGER),
                ('ChangeTime',     wintypes.LARGE_INTEGER),
                ('EndOfFile',      wintypes.LARGE_INTEGER),
                ('AllocationSize', wintypes.LARGE_INTEGER),
                ('FileAttributes', wintypes.ULONG),
                ('FileNameLength', wintypes.ULONG),
                ('_FileName',      wintypes.WCHAR * 1))

    @property
    def FileName(self):
        addr = ctypes.addressof(self) + type(self)._FileName.offset
        size = self.FileNameLength // ctypes.sizeof(wintypes.WCHAR)
        return (wintypes.WCHAR * size).from_address(addr).value

class DirEntry(FILE_DIRECTORY_INFORMATION):
    def __repr__(self):
        return '<{} {!r}>'.format(self.__class__.__name__, self.FileName)

    @classmethod
    def listbuf(cls, buf):
        result = []
        base_size = ctypes.sizeof(cls) - ctypes.sizeof(wintypes.WCHAR)
        offset = 0
        while True:
            fdi = cls.from_buffer(buf, offset)
            if fdi.FileNameLength and fdi.FileName not in ('.', '..'):
                cfdi = cls()
                size = base_size + fdi.FileNameLength
                ctypes.resize(cfdi, size)
                ctypes.memmove(ctypes.byref(cfdi), ctypes.byref(fdi), size)
                result.append(cfdi)
            if fdi._Next:
                offset += fdi._Next
            else:
                break
        return result

def isdir(path):
    if not isinstance(path, int):
        return os.path.isdir(path)
    try:
        hFile = msvcrt.get_osfhandle(path)
    except IOError:
        return False
    iosb = IO_STATUS_BLOCK()
    info = FILE_BASIC_INFORMATION()
    status = ntdll.NtQueryInformationFile(hFile, ctypes.byref(iosb),
                ctypes.byref(info), ctypes.sizeof(info),
                FileBasicInformation)
    return bool(status >= 0 and info.FileAttributes & FILE_ATTRIBUTE_DIRECTORY)

def ntlistdir(path=None):
    result = []

    if path is None:
        path = os.getcwd()

    if isinstance(path, int):
        close = False
        fd = path
        hFile = msvcrt.get_osfhandle(fd)
    else:
        close = True
        hFile = kernel32.CreateFileW(path, GENERIC_READ, FILE_SHARE_READ,
                    None, OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS, None)
        if hFile == INVALID_HANDLE_VALUE:
            raise ctypes.WinError(ctypes.get_last_error())
        fd = msvcrt.open_osfhandle(hFile, os.O_RDONLY)

    try:
        if not isdir(fd):
            raise ctypes.WinError(ERROR_DIRECTORY)
        iosb = IO_STATUS_BLOCK()
        info = (ctypes.c_char * 4096)()
        while True:
            status = ntdll.NtQueryDirectoryFile(hFile, None, None, None,
                        ctypes.byref(iosb), ctypes.byref(info),
                        ctypes.sizeof(info), FileDirectoryInformation,
                        False, None, False)
            if (status == STATUS_BUFFER_OVERFLOW or
                iosb.Information == 0 and status >= 0):
                info = (ctypes.c_char * (ctypes.sizeof(info) * 2))()
            elif status == STATUS_NO_MORE_FILES:
                break
            elif status >= 0:
                sublist = DirEntry.listbuf(info)
                result.extend(sublist)
            else:
                raise NtError(status)
    finally:
        if close:
            os.close(fd)

    return result

for entry in ntlistdir(r"C:\Users\UserName\OneDriveFolder\BigFolder"):
    print(entry.FileName)
GordonAitchJay
  • 4,640
  • 1
  • 14
  • 16