Best if you truly understand what is slowing down your script.
The following all contribute to unacceptable performance:
- Excessive CALLs in a loop
- Randomly selecting a number until you get one that has not been selected yet
- Redirection in append mode for each file - best to redirect only once
Enabling and disabling delayed expansion normally does not contribute much to bad performance (unless you have a massively large environment space)
My code below uses the FINDSTR technique to quickly build the array, without needing delayed expansion.
I then can enable delayed expansion just once for each folder. I never have to pass any values across the endlocal "barrier"
I guarantee that each random operation selects an available file by building a list of possible numbers, fixed width of 4 with leading spaces. The list must fit in a single variable with max length of ~8190 bytes, so this solution supports up to ~2040 files per folder. Each random number specifies which position to take from the list, and then the value is extracted and the count decremented.
I enclose the entire outer loop in an extra set of parentheses so that I only need to redirect once.
I'm pretty sure this code will be significantly faster than even your 2nd code that does not support ! etc.
benham1.bat
@echo off
chcp 1254>nul
setlocal DisableDelayedExpansion
set /p "maxCnt=Select Desired Number of Random Episodes per Album:"
pushd "H:\itunes\Podcasts"
>"%USERPROFILE%\Desktop\RandomEpisodes.m3u8" (
for /d %%F in (*) do (
pushd "%%F"
set "fileCnt=0"
for /f "delims=: tokens=1,2" %%A in ('dir /b /a-d 2^>nul^|findstr /n "^"') do (
set "$$%%A=%%B"
set "fileCnt=%%A"
)
setlocal enableDelayedExpansion
set "nums="
for /l %%N in (1 1 !fileCnt!) do (
set "n= %%N"
set "nums=!nums!!n:~-4!"
)
if !fileCnt! lss !maxCnt! (set "cnt=!fileCnt!") else set "cnt=!maxCnt!"
for /l %%N in (1 1 !cnt!) do (
set /a "pos=(!random!%%fileCnt)*4, next=pos+4, fileCnt-=1"
for /f "tokens=1,2" %%A in ("!pos! !next!") do (
for /f %%N in ("!nums:~%%A,4!") do echo !$$%%N!
set "nums=!nums:~0,%%A!!nums:~%%B!"
)
)
endlocal
popd
)
)
popd
Update and solution
It really bothered me that this code behaved so poorly in RKO's hands (see his comment). So I did some tests of my own to figure out what is happening.
I ran the code against 500 folders with 100 files in each folder, and I asked for an output of 50 files for each folder. I modified the loop to print out the name of each folder to stderr so I could monitor progress. As expected, each folder was processed very quickly at the beginning. But as the program progressed, each folder became slower than the previous one, in a non-linear fashion. By the time it reached the end, each folder was painfully slow.
It took ~3 minutes to process 500 folders. I then doubled the number of folders, and the time exploded to ~18 minutes. Very nasty.
Way back when, a group of us at DosTips tried to investigate how cmd.exe manages the environment space, and how large environments impact performance. See Why does SET performance degrade as environment size grows?
We determined that ENDLOCAL does not actually release allocated memory, which causes SET performance to degrade as the number of SET operations accumulates. This problem has lots of SET operations, so it makes sense that it becomes slow.
But RKO has code with lots of inefficiencies that is performing better than mine. In the absence of environment size issues, his code should be much slower. So somehow his code must not be accumulating memory like mine. So I went on a quest to isolate the memory allocation for each folder from all the rest.
My first attempt was to put all the memory allocation for a single folder within a new cmd.exe process. And it worked! Processing 500 folders now took ~1 minute, and doubling to 1000 folders basically doubled the time to ~2 minutes!
benham2.bat
@echo off
if "%~1" equ ":processFolder" (
pushd "%folder%"
set "fileCnt=0"
for /f "delims=: tokens=1,2" %%A in ('dir /b /a-d 2^>nul^|findstr /n "^"') do (
set "$$%%A=%%B"
set "fileCnt=%%A"
)
setlocal enableDelayedExpansion
set "nums="
for /l %%N in (1 1 !fileCnt!) do (
set "n= %%N"
set "nums=!nums!!n:~-4!"
)
if !fileCnt! lss !maxCnt! (set "cnt=!fileCnt!") else set "cnt=!maxCnt!"
for /l %%N in (1 1 !cnt!) do (
set /a "pos=(!random!%%fileCnt)*4, next=pos+4, fileCnt-=1"
for /f "tokens=1,2" %%A in ("!pos! !next!") do (
for /f %%N in ("!nums:~%%A,4!") do echo !$$%%N!
set "nums=!nums:~0,%%A!!nums:~%%B!"
)
)
popd
exit
)
setlocal DisableDelayedExpansion
set /p "maxCnt=Select Desired Number of Random Episodes per Album:"
pushd "H:\itunes\Podcasts"
>"%USERPROFILE%\Desktop\RandomEpisodes.m3u8" (
for /d %%F in (*) do (
set "folder=%%F"
cmd /v:off /c ^""%~f0" :processFolder^"
)
)
popd
exit /b
But then I realized that I didn't have to restart my console for each test of my original benham1 code - when the batch script terminated, the memory seemed to have reset because the next run would start out just as fast as the prior one.
So I thought, why not simply CALL a :subroutine instead of initiating a new cmd.exe. This worked about the same, just a little bit better!
benham3.bat
@echo off
setlocal DisableDelayedExpansion
set /p "maxCnt=Select Desired Number of Random Episodes per Album:"
pushd "H:\itunes\Podcasts"
>"%USERPROFILE%\Desktop\RandomEpisodes.m3u8" (
for /d %%F in (*) do (
set "folder=%%F"
call :go
)
)
popd
exit /b
:go
setlocal
cd %folder%
set "fileCnt=0"
for /f "delims=: tokens=1,2" %%A in ('dir /b /a-d 2^>nul^|findstr /n "^"') do (
set "$$%%A=%%B"
set "fileCnt=%%A"
)
setlocal enableDelayedExpansion
set "nums="
for /l %%N in (1 1 !fileCnt!) do (
set "n= %%N"
set "nums=!nums!!n:~-4!"
)
if !fileCnt! lss !maxCnt! (set "cnt=!fileCnt!") else set "cnt=!maxCnt!"
for /l %%N in (1 1 !cnt!) do (
set /a "pos=(!random!%%fileCnt)*4, next=pos+4, fileCnt-=1"
for /f "tokens=1,2" %%A in ("!pos! !next!") do (
for /f %%N in ("!nums:~%%A,4!") do echo !$$%%N!
set "nums=!nums:~0,%%A!!nums:~%%B!"
)
)
exit /b
Another Update
I substituted Aacini's superior Array based method for guaranteeing each random operation selects a unique file name in place of my string based method. It yields slightly better performance:
benham-aacini.bat
@echo off
setlocal DisableDelayedExpansion
set /p "maxCnt=Select Desired Number of Random Episodes per Album:"
pushd "H:\itunes\Podcasts"
>"%USERPROFILE%\Desktop\RandomEpisodes.m3u8" (
for /d %%F in (*) do (
set "folder=%%F"
call :go
)
)
popd
exit /b
:go
setlocal
cd "%folder%"
set "fileCnt=0"
for /f "delims=: tokens=1,2" %%A in ('dir /b /a-d 2^>nul^|findstr /n "^"') do (
set "$$%%A=%%B"
set /a "fileCnt=%%A, RN%%A=%%A"
)
setlocal enableDelayedExpansion
if !fileCnt! lss !maxCnt! (set end=1) else set /a "end=fileCnt-maxCnt+1"
for /l %%N in (!fileCnt! -1 !end!) do (
set /a "ran=!random!%%%%N+1"
set /a "num=RN!ran!, RN!ran!=RN%%N
for %%N in (!num!) do echo !cd!\!$$%%N!
)
exit /b
Here is a summary of the timings of each version:
folders | benham1 benham2 benham3 benham-aacini
---------+-------------------------------------------
500 | 2:49 0:53 0:43 0:41
1000 | 17:48 1:56 1:44 1:34
So our original thinking at DosTips that the environment never shrinks is wrong. But I haven't had time to fully test and determine exactly when it shrinks.
Regardless, I think I finally have a version that is truly faster than your current 13 minute code :-)
Yet Another Update (Assume few collisions)
Multiple random selections from the complete set of files might result in duplicates. My code assumes that the number of requested files might be a large portion of a large list, in which case you can expect to get many duplicates.
So all of my previous solutions did some extra bookkeeping to guarantee that each random operation results in a unique file.
But RKO seems to typically select just a few files from each folder. So the chance of collision is small. His code just randomly selects a file, and then if the file has been selected before, it loops back and tries again until it finds a new file. But since the chance of collision is small, the retry rarely happens. This method has significantly less bookkeeping. The result is that his random selection algorithm is faster as long as the number requested remains small.
So I have adopted my code to use a slightly modified version of RKO's selection method. I expect this to be the fastest yet for a small request counts.
benham4.bat
@echo off
setlocal DisableDelayedExpansion
set /p "maxCnt=Select Desired Number of Random Episodes per Album:"
pushd "H:\itunes\Podcasts"
>"%USERPROFILE%\Desktop\RandomEpisodes.m3u8" (
for /d %%F in (*) do (
set "folder=%%F"
call :selectRandom
)
)
popd
exit /b
:selectRandom
setlocal
cd "%folder%"
set "fileCnt=0"
for /f "delims=: tokens=1,2" %%A in ('dir /b /a-d 2^>nul^|findstr /n "^"') do (
set "$$%%A=%%B"
set "fileCnt=%%A"
)
setlocal enableDelayedExpansion
if !fileCnt! lss !maxCnt! (set /a end=fileCnt) else set /a end=maxCnt
for /l %%N in (1 1 !end!) do (
set /a "N=!random!%%fileCnt+1"
if not defined $$!N! call :tryAgain
for %%N in (!N!) do echo !folder!\!$$%%N!
set "$$!N!="
)
exit /b
:tryAgain
set /a "N=!random!%%fileCnt+1"
if not defined $$!N! goto :tryAgain
exit /b