0

My problem in short is when I run os.walk() I get an accurate list of files, but when I try getting information about those files like their last modified date, file size or even just try open() with them, I get an error that the file cannot be found for only some files. Roughly 0.2% for reasons that are unclear.

Background

At work we have a server running Windows Server 2012 R2 (I know, I know..). We are wanting to automate moving targeted shared folders to specific shared drives in Google Drive.

The first thing I wanted to do was get a list of files and their last modified dates and file sizes to be used later. The code I wrote worked fine on my laptop which was running Windows 11, but when I tried pointing it at a few different share folders on the server it ran into the same issue repeatedly.

Troubleshooting

I don't think it's a code issue and have revised my code several times to be simpler while ending up with the same end result - where it works locally but fails to fully go through the share folders.

My first thought was maybe it was due to long path names (the 255 character limit on older systems) but it was successfully finding files whose paths were > 300 characters.

My next thought was maybe there's a clear pattern to what kind of files can't be found, but in a given folder it'd find most of the PDFs successfully but then fail to locate one or a few of the others. This is just an observed example, it is not specific to PDFs.

I've dedicated probably 6-8 hours total into trying to troubleshoot and investigate this but I'm pretty stumped at this point.

Code

do_test.py - uses the hurry.filesize package for rough file sizes

import os
import datetime
from hurry.filesize import size
from pprint import pprint

# Test directory
src = "//[DC]/PATH/TO/FOLDER"

def simple_file_check(src_dir):
    total_bytes = 0
    total_files = 0
    total_folders = 0
    total_not_found = 0
    files_not_found = []

    for (root, dirs, files) in os.walk(src_dir):
        # just count files and folders for now
        total_files += len(files)
        total_folders += len(dirs)
        # Get full-path file names
        fnames = [os.path.join(root, f).replace("\\","/") for f in files]

        # Get their sizes and sum it up
        fsizes = []
        for f in fnames:
            try:
                fsizes.append(os.stat(f).st_size)
            except Exception as e:
                files_not_found.append(f)
        total_bytes += sum(fsizes)

    total_size = size(total_bytes)
    total_not_found += len(files_not_found)
    pct_missing = total_not_found/total_not_found+total_files*100

    data = {
        "ttl-size": total_size,
        "ttl-files": total_files,
        "ttl-folders": total_folders,
        "ttl-not-found": total_not_found,
        "pct-missing": "{}%".format(pct_missing)
    }
    pprint(data)

def time_it_pls(func, *arg):
    begin_dt = datetime.datetime.now()
    begin = str(begin_dt)[:19]
    print("beginning execution at: {}".format(begin))
    func(*arg)
    end_dt = datetime.datetime.now()
    end = str(end_dt)[:19]
    print("ending execution at: {}".format(end))
    print("time taken: {}".format(end_dt - begin_dt))

time_it_pls(simple_file_check, src)

result

beginning execution at: 2023-06-21 14:50:06
{'pct-missing': '0.19806269922322284%',
 'ttl-files': 193878,
 'ttl-folders': 18150,
 'ttl-not-found': 384,
 'ttl-size': '210G'}
ending execution at: 2023-06-21 14:51:11
time taken: 0:01:05.302772

specific error message without the Exception block

Traceback (most recent call last):
  File "C:\it_scripts\do_test.py", line 53, in <module>   
    time_it_pls(simple_file_check, src)
  File "C:\it_scripts\do_test.py", line 47, in time_it_pls
    func(*arg)
  File "C:\it_scripts\do_test.py", line 25, in simple_file_check
    fsizes.append(os.stat(f).st_size)
                  ^^^^^^^^^^
FileNotFoundError: [WinError 3] The system cannot find the path specified: '//DC/PATH/TO/FILE'

--edit--

I get a similar error when trying to use open() against just an individual file like so in the interpreter.

>>> f = "//DC/PATH/TO/FILE" # actual path length is 267 characters long and copied from the exception in the previous example.
>>> d = open(f)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: '//DC/PATH/TO/FILE'

--edit 2--

We are getting closer! Just trying to list the folder in PowerShell, I can see that the file exists but if I try running ls against the individual file I get an error. So it's not python specific and is hinting at something weird on the Windows side.

Here's a much less redacted version of the output and error on the PS-side. Please understand that this does need to be redacted to an extent due to the sensitive nature of these files.

PS C:\Users\sani> ls "\\DC\#CONSOLIDATION of Checklists, Declarations, Examples, and Documents for [REDACTED], U, and I751 Applications\Application- U Visa\Closing Letters\
Rejections\"


    Directory: \\DC\#CONSOLIDATION of Checklists, Declarations, Examples, and Documents for [REDACTED], U, and I751 Applications\Application- U Visa\Closing
    Letters\Rejections


Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a----         5/31/2019   3:06 PM          13025 Samole closing Letter - No DV Simple assault.docx
-a----        11/21/2018   3:10 PM          16232 Sample Closing Letter-Not a qualifying crime (Sp).dotx
-a----         7/26/2018  11:32 AM          13581 Sample Closing Letter-RE PC does not qualify a indirect victim.dotx
-a----        11/21/2018   3:14 PM          12908 Sample Closing Letter-RE U Cert Request Denied.dotx
-a----          7/9/2018   7:25 PM          13500 Sample Closing Letter-Unqualifying crime.dotx
-a----         7/26/2018   6:19 PM          12769 Sample Closing Ltr w Copy of File (Sp), Over Income.dotx
-a----         7/26/2018   1:24 PM          16432 Sample Rejection Letter, unqualifying crime.dotx


PS C:\Users\sani> ls "\\DC\#CONSOLIDATION of Checklists, Declarations, Examples, and Documents for [REDACTED], U, and I751 Applications\Application- U Visa\Closing Letters\
Rejections\Sample Closing Letter-RE PC does not qualify a indirect victim.dotx"
ls : Cannot find path '\\DC\#CONSOLIDATION of Checklists, Declarations, Examples, and Documents for [REDACTED], U, and I751 Applications\Application- U Visa\Closing 
Letters\Rejections\Sample Closing Letter-RE PC does not qualify a indirect victim.dotx' because it does not exist.
At line:1 char:1
+ ls "\\DC\#CONSOLIDATION  ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (\\DC\...ect victim.dotx:String) [Get-ChildItem], ItemNotFoundException
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetChildItemCommand
sanigirl
  • 433
  • 3
  • 19
  • Don't catch all exceptions in this way. You are losing important debug information. There is `traceback.print_exc` which prints the whole exception traceback when placed in the `except` block. – Michael Butscher Jun 21 '23 at 22:19
  • Remove the error catcher. Once the error appears, please post it. – SimplyDev Jun 21 '23 at 22:21
  • @SimplyDev posted – sanigirl Jun 21 '23 at 22:23
  • @MichaelButscher usually yes, but as this is supposed to be an extremely simplified test with a single error that's being caught it is okay here. I've posted the specific error I am catching. – sanigirl Jun 21 '23 at 22:23
  • @Barmar that's irrelevant to the problem at hand. – sanigirl Jun 21 '23 at 22:28
  • @Barmar no, I've changed the actual folder so I can post the code publicly. You can see the successful output in the result block. – sanigirl Jun 21 '23 at 22:33
  • I understand changing names, but you need to be *consistent*. Otherwise it's hard to tell precisely what's going wrong. – Barmar Jun 21 '23 at 22:35
  • Does the file that's named in the error message exist or not? – Barmar Jun 21 '23 at 22:35
  • @Barmar yes it does. – sanigirl Jun 21 '23 at 22:35
  • I've also stated in the post that it can't files that *do* exist. so.. – sanigirl Jun 21 '23 at 22:36
  • The pathname is missing the device prefix like `C:`. Maybe it's looking on the wrong device? – Barmar Jun 21 '23 at 22:37
  • Have you tried adding a retry step in the script? Maybe retry `os.stat` just once if it fails for a particular file? – slothrop Jun 21 '23 at 22:37
  • @Barmar are you familiar with what a share folder and a DC are? – sanigirl Jun 21 '23 at 22:38
  • Sorry, no -- I'm a Unix/Mac guy, I don't know Windows details like that. I've just never seen a Windows pathname that didn't start with a letter prefix. – Barmar Jun 21 '23 at 22:39
  • @Barmar okay well, basically this is accessing files on a server that is separate from the local system the script's being run from. If you have permission to use those folders, you can visit them via //servername/path/to/folder. – sanigirl Jun 21 '23 at 22:41
  • @saniboy, I still think that the file path is invalid. Try going into a terminal and running `cd //DC/PATH/TO/FILE` with no other changes. – SimplyDev Jun 21 '23 at 22:44
  • @slothrop unfortunately it's not just a single one, there are random files across multiple servers it seems to fail against. I just tried using ```open()``` on an individual file I know python's not finding and have shared the error. It's... a very strange scenario. – sanigirl Jun 21 '23 at 22:50
  • @SimplyDev I can CD into the specific directories and can successfully run ```ls //DC/PATH/TO/FILE-FROM-EXCEPTION```, I'll add that to my troubleshooting section momentarily. – sanigirl Jun 21 '23 at 22:51
  • @SimplyDev actually, no I can't, which is interesting?? It's listed if I run LS against the folder, but if I try against the individual file it doesn't work. – sanigirl Jun 21 '23 at 22:57
  • Maybe it's the replacing `\\ ` with `/` that's causing the problem? – Mark Ransom Jun 21 '23 at 22:59
  • `actual path length is 267 characters long` Doesn't Windows have a [path length limit](https://stackoverflow.com/questions/1880321/why-does-the-260-character-path-length-limit-exist-in-windows) of 260? There's a registry key you can modify to fix this, but I forget what it is. – Nick ODell Jun 21 '23 at 23:01
  • 1
    @NickODell ```My first thought was maybe it was due to long path names (the 255 character limit on older systems) but it was successfully finding files whose paths were > 300 characters.``` – sanigirl Jun 21 '23 at 23:03
  • @NickODell ah you know what actually, long paths are enabled on the client-end but it'd make sense if it was failing at the server end. Reading more into it, it looks like it's not possible to lift the limit for 2012-r2 and I'd need to be on at least 2016 https://superuser.com/questions/1528789/windows-server-2012-r2-standard-enable-ntfs-long-paths-policy-option-missing#1528883 – sanigirl Jun 21 '23 at 23:22
  • Does this check out to you? – sanigirl Jun 21 '23 at 23:22
  • strange again that it was getting data on files > 300 but :| it sounds like I might need to finally push us off of 2012 and into more modern server OS's. If anything I appreciate having other technical folk I could look into this with because it's been a real pain. – sanigirl Jun 21 '23 at 23:27

0 Answers0