0

I have a list of about 600 filenames in a file called upload_filenames.txt, and I want to find out where they are in a tree which has about .7M files in 9K subdirectories.

This (from this question), does the job:

for /F "usebackq delims=" %%i in (upload_filenames.txt) do (
    for /F "delims=" %%b in ('dir /B /S /A:-D "%%i"') do (
        echo %%~nxb;"%%~fb" >> exists.txt
    )
)

Now, in the same loop, I'd also like to fill a second file with all files not found. (I can get it manually from both lists, but I'd prefer the automated way.)

So far, I've learned that a FOR loop and if exist return errorlevel 0 when succesful, but only 'File Not Found' and no errorlevel when not. So I can't use those. So is there any way to do this in a batch?

Aside: I don't care about efficiency. The script above took about 10 hours to complete. So be it - for now.

Windows 10 or Server 2008

Compo
  • 36,585
  • 5
  • 27
  • 39
RolfBly
  • 3,612
  • 5
  • 32
  • 46
  • 1
    For files found, what about a one liner, `@(For /F Delims^= %%G In ('Dir /B /S /A:-D ^| "%SystemRoot%\System32\findstr.exe" /E /I /L /G:"upload_filenames.txt"') Do @Echo %%~nxG,%%G) 1> "exists.txt"`? – Compo Nov 11 '20 at 14:32
  • @Compo Nice! Hadn't come across `findstring` yet. – RolfBly Nov 11 '20 at 15:24

3 Answers3

3

Redirecting the whole loop in one go is much faster than writing line by line. Also the repeated dir takes a lot of time.Do the dir just once (into a file) and work with that result. findstr is quite effective, so I guess it's faster to postprocess instead of an if within thefor loop.

@echo off
setlocal
dir /b /s /a-d * > "files.txt"
(for /F "usebackq delims=" %%i in ("upload_filenames.txt") do (
  for /f "delims=" %%b in ('findstr /iec:"\\%%i" "files.txt" ^|^| echo ~') do (
    echo %%i;%%b
  )
)) > "result.txt"
findstr /ev ";~" "result.txt" > "existing.txt"
findstr /e ";~" "result.txt" > "missing.txt"
rem del "result.txt"

The key-line is findstr /iec:"\\%%i" files.txt || echo ~, which will output the line when it ends with the filename and if the file is not found, findstr will output nothing. In that case (|| acts as "if previous command failed then" (source)), the echo command will execute and output ~ (change to any string you want, but it has to be anything, because the for loop skips empty lines)

Stephan
  • 53,940
  • 10
  • 58
  • 91
  • 1
    `||` acts as "if previous command failed then" (there is also `&&` "if previous command was successful then"). Escaping special chars (like `|` is neccessary within the `in` clause (as it is executed in a secondary process). `Setlocal` isn't really necessary here (It's a habit of mine to use it to not leave any variables alive after the script ends) – Stephan Nov 11 '20 at 16:26
  • 1
    [Source](https://ss64.com/nt/syntax-redirection.html) of the `||` trick. – Stephan Nov 11 '20 at 16:30
  • 2
    Perhaps you should change the `findstr` command line to `findstr /I /E /C:"\\%%i" "files.txt"` to avoid problems with spaces in file names (remember `findstr` separates multiple search strings with spaces unless `/C` is provided) and to force literal searching. Also files whose names begin with a meta-character (like `.`, `[`, `]`, `-`) may fail as the `\ `becomes consumed for escaping (even for literal searches!), so `\\ `is an escaped backslash that is maintained… – aschipfl Nov 11 '20 at 22:20
  • Thank you, @aschipfl, that's correct. I changed my code accordingly. – Stephan Nov 12 '20 at 08:00
  • Just FYI, this works and completed in about 15 minutes, on the aforementioned .7M files in 9K subdirectories. Very nice indeed! – RolfBly Nov 14 '20 at 12:39
  • Thank you for the feedback. A speed-up factor of ~40 is near the upper end of my range of expectation, so this is quite satisfying (especially because you wrote that speed is not really an issue). Please [read](https://stackoverflow.com/help/someone-answers) – Stephan Nov 14 '20 at 12:52
  • @Stephan I did read :-) Had not hit tick mark because I was trying out aschipfl's code too. – RolfBly Nov 14 '20 at 14:04
0

If I am getting what you want correctly, then this should do the trick. Untested! Will only be able to test this later:

@echo off
for /F "usebackq delims=" %%a in ("upload_filenames.txt") do (
for /f "delims=" %%i in ('dir /b /s /a:d') do (
    pushd "%%~i"
    dir /b /a-d | findstr /i "%%~a">nul && echo %%~a;"%%~i%%~a">>exists.txt || echo %%~a not found in "%%~i">>not_exist.txt
    popd
  )
)

or if you prefer not using conditional operators:

@echo off
for /F "usebackq delims=" %%a in ("upload_filenames.txt") do (
for /f "delims=" %%i in ('dir /b /s /a:d') do (
    pushd "%%~i"
    dir /b /a-d | findstr /i "%%~a">nul
    if not errorlevel 1 (
        echo %%~a;"%%~i\%%~a">> exists.txt
      ) else (
        echo %%~a not found in "%%~i">>not_exist.txt
  )
  popd
  )
)

So the idea is to take the file names one by one, then dir to recursively find each directory, pushd to the directory, then do a dir: of the file

Gerhard
  • 22,678
  • 7
  • 27
  • 43
0

Like others, I would use findstr, because it is a lot faster than using nested for loops. However, I would turn the thing around and let a list of the actually present files be the search strings and use them to search the input list file upload_filenames.txt. Though I could not get around one for loop to derive the pure file names from the file paths. Anyway, here is the code:

@echo off
setlocal EnableExtensions DisableDelayedExpansion

rem // Define constants here:
set "_ROOT=."                         & rem // (path to target root directory)
set "_MASK=*.*"                       & rem // (file pattern, usually `*.*` for all)
set "_LIST=%~dp0upload_filenames.txt" & rem // (path to file containing name list)
set "_PASS=%~dp0found.txt"            & rem // (path to positive result file)
set "_FAIL=%~dp0missing.txt"          & rem // (path to negative result file)
set "_FULL=%~dpn0_all.tmp"            & rem // (path to a temporary file)
set "_NAME=%~dpn0_names.tmp"          & rem // (path to another temporary file)

rem // Put list of full paths of all files in the target directory tree to a file:
dir /S /B /A:-D "%_ROOT%\%_MASK%" > "%_FULL%"
rem // Reduce list of paths by maintaining the only pure file names:
> "%_NAME%" (
    for /F "usebackq delims= eol=|" %%L in ("%_FULL%") do (
        echo(%%~nxL
    )
)
rem /* Let `findstr` twice do the search, using the file names from the target
rem    directory tree as search strings against the original list file: */
findstr /I /X    /L /G:"%_NAME%" "%_LIST%" > "%_PASS%"   
findstr /I /X /V /L /G:"%_NAME%" "%_LIST%" > "%_FAIL%"
rem // Clean up temporary files:
del "%_FULL%" "%_NAME%"

endlocal
exit /B

Here is a slightly different approach, which returns full paths of the existing files, which becomes important when file names may occur multiple times within the given directory tree:

@echo off
setlocal EnableExtensions DisableDelayedExpansion

rem // Define constants here:
set "_ROOT=."                         & rem // (path to target root directory)
set "_MASK=*.*"                       & rem // (file pattern, usually `*.*` for all)
set "_LIST=%~dp0upload_filenames.txt" & rem // (path to file containing name list)
set "_PASS=%~dp0found.txt"            & rem // (path to positive result file)
set "_FAIL=%~dp0missing.txt"          & rem // (path to negative result file)
set "_AUGM=%_LIST%.tmp"               & rem // (path to temporary list file)
set "_FULL=%~dpn0_all.tmp"            & rem // (path to a temporary file)
set "_NAME=%~dpn0_names.tmp"          & rem // (path to another temporary file)

rem // Create augmented copy of list file with each line preceded by `\\`:
> "%_AUGM%" (
    for /F "usebackq delims= eol=|" %%L in ("%_LIST%") do (
        echo(\\%%~L
    )
)
rem // Put list of full paths of all files in the target directory tree to a file:
dir /S /B /A:-D "%_ROOT%\%_MASK%" > "%_FULL%"
rem // Reduce list of paths by maintaining the only pure file names:
> "%_NAME%" (
    for /F "usebackq delims= eol=|" %%L in ("%_FULL%") do (
        echo(%%~nxL
    )
)
rem /* Let `findstr` do a search, using the augmented list file against the file
rem    containing the list of full paths in order to eventually get full paths,
rem    which is particularly important if file names are not unique in the tree: */
findstr /I /E    /L /G:"%_AUGM%" "%_FULL%" > "%_PASS%"
rem /* Let `findstr` do another search, using the file names from the target
rem    directory tree as search strings against the original list file this time: */
findstr /I /X /V /L /G:"%_NAME%" "%_LIST%" > "%_FAIL%"
rem // Clean up temporary files:
del "%_AUGM%" "%_FULL%" "%_NAME%"

endlocal
exit /B

N. B.: Luckily the nasty flaw of findstr with multiple literal search strings does not apply here, because we are performing case-insensitive searches. Also there is no problem with unintended escaping, because pure file names cannot contain \, which is the escape character of findstr.

aschipfl
  • 33,626
  • 12
  • 54
  • 99
  • Very elaborate, but because of this approach the hit file doesn't have `filename.ext;` up front in the line, needed to make reviewing easier. It completed in about 40 minutes. – RolfBly Nov 15 '20 at 20:20