7

PLEASE SEE BELOW THE ORIGINAL QUESTION FOR SOME TEST COMPARISONS OF DIFFERENT WAYS:


So I tried 2 ways so far:

1.Iterate through directory using the code from Get Folder Size from Windows Command Line :

@echo off
set size=0
for /r %%x in (folder\*) do set /a size+=%%~zx
echo %size% Bytes

2.Save output of a

'dir %folder% /s /a'  

into a text file, and then read in the size at the bottom

3.The last way I am trying right now is using du (disk utility tool from MS - https://technet.microsoft.com/en-us/sysinternals/bb896651.aspx ).


Now with exception of #3 both of those ways seem way too slow for what I need (100s of thousands of files). So the question is which one of these is the fastest / should be fastest, and if there are any other fast(er) ways to get size of folder contents that has 100k+ files (and there are 100s of folders)


START EDIT:

Below is my very hacky way of doing the comparison (butchered my program to see some outputs)
There are some small bugs with some parts like option 3 will fail because it tries to handle a number bigger than 32-bit limit, and I'm sure there is some more issues, but the general timings I think are evident unless I really messed up on my logic.

Option I: Iterate through directories, using VB script to read in the text output from 'dir' and look for the size at the end + convert it to MB (originally got it from somewhere else that I actually lose the place where I got it from) Option II: Iterate, with findstr pipe and output the result directly (no converstion to MB) - from @MC ND Option III: use the compact command to iterate - from @npocmaka Option IV: from @user1016274 - using robocoby

(There are some more answers, but these are the ones I've been able to incorporate)

These are the results i got, and they are pretty consistent in relevance to each other, robocopy blows them away

Option I and Option II were usually close, with Option II slightly better (anywhere from 1min 10sec to 2min 10 secs for both, not sure where the difference is coming from) Part III - 16-17 mins Part IV - 10-20 seconds

@echo OFF
setlocal enabledelayedexpansion

REM OPTION I - directory iteration
REM OPTION II - iteration with findstr pipe
REM OPTION III - compact

:MAIN
REM Initialize log filename
for /f "delims=" %%a in ('echo %date:~10,4%%date:~4,2%%date:~7,2%%time:~0,2%%time:~3,2%%time:~6,2%') do @set LOGFILEPOSTFIX=%%a
set LOGFILEPOSTFIX=%date:~10,4%%date:~4,2%%date:~7,2%%time:~0,2%%time:~3,2%%time:~6,2%
set TIMESTAMP=%date:~10,4%_%date:~4,2%_%date:~7,2%_%time:~0,2%_%time:~3,2%_%time:~6,2%
echo %TIMESTAMP% 
set "LOGFILE=Proj_not_in_db_%LOGFILEPOSTFIX%.log"


set option=1
set TIMESTAMP=%date:~10,4%_%date:~4,2%_%date:~7,2%_%time:~0,2%_%time:~3,2%_%time:~6,2%
echo %TIMESTAMP% - PART I ---- Directory Listing into file, iterate through the sizes of all files inside folder >> %LOGFILE%
echo %TIMESTAMP% - PART I
call :PROCESSFOLDER
set TIMESTAMP=%date:~10,4%_%date:~4,2%_%date:~7,2%_%time:~0,2%_%time:~3,2%_%time:~6,2%
echo %TIMESTAMP% - PART I ---- END >> %LOGFILE%
echo %TIMESTAMP% - PART I - END
set option=2
set TIMESTAMP=%date:~10,4%_%date:~4,2%_%date:~7,2%_%time:~0,2%_%time:~3,2%_%time:~6,2%
echo %TIMESTAMP% - PART II  findstr pipe ---- >> %LOGFILE%
echo %TIMESTAMP% - PART II
call :PROCESSFOLDER
set TIMESTAMP=%date:~10,4%_%date:~4,2%_%date:~7,2%_%time:~0,2%_%time:~3,2%_%time:~6,2%
echo %TIMESTAMP% - PART II ---- END>> %LOGFILE%
echo %TIMESTAMP% - PART II - END
set option=3
set TIMESTAMP=%date:~10,4%_%date:~4,2%_%date:~7,2%_%time:~0,2%_%time:~3,2%_%time:~6,2%
echo %TIMESTAMP% - PART III compact ---- >> %LOGFILE%
echo %TIMESTAMP% - PART III
call :PROCESSFOLDER
set TIMESTAMP=%date:~10,4%_%date:~4,2%_%date:~7,2%_%time:~0,2%_%time:~3,2%_%time:~6,2%
echo %TIMESTAMP% - PART III ---- END>> %LOGFILE%
echo %TIMESTAMP% - PART III - END
set option=4
set TIMESTAMP=%date:~10,4%_%date:~4,2%_%date:~7,2%_%time:~0,2%_%time:~3,2%_%time:~6,2%
echo %TIMESTAMP% - PART IV robocopy ---- >> %LOGFILE%
echo %TIMESTAMP% - PART IV
call :PROCESSFOLDER

call :CLEANUP
echo FINAL
pause
goto :EOF

:PROCESSFOLDER

    echo C:\Windows
    echo Processing C:\Windows >>  %LOGFILE%
    break > projects_in_folder.tmp
    for /f "tokens=1-4,* SKIP=7" %%b IN ('dir "C:\Windows" /Q /TW /AD') do (
        set _folder=%%f
        REM Don't write the 2 lines at the end displaying summary information
        if NOT "%%e" EQU "bytes" (
            SET _folder=!_folder:~23!
            echo !_folder!,%%b>> projects_in_folder.tmp
        )   
    )
    set "folder_path=C:\Windows"
    call :COMPARE
goto :EOF

:COMPARE
set file_name=%folder_path:\=_%
break > "%file_name%.txt"
if %option%==4 (
    set "full_path=C:\Windows"
    call :GETFOLDERINFO4
    set TIMESTAMP=%date:~10,4%_%date:~4,2%_%date:~7,2%_%time:~0,2%_%time:~3,2%_%time:~6,2%
    echo %TIMESTAMP% - PART IV ---- END>> %LOGFILE%
    echo %TIMESTAMP% - PART IV - END
)


for /f "tokens=1,2* delims=," %%a in (projects_in_folder.tmp) do (
    for /f "tokens=1,* delims=_" %%x in ("%%a") do (
        set "projcode=%%x"
    )
    set full_path=%folder_path%\%%a
    if %option%==1 call :GETFOLDERINFO 
    if %option%==2 call :GETFOLDERINFO2
    if %option%==3 call :GETFOLDERINFO3

    echo PROJ: %%a SIZE: !totalsize! LASTMODIFIED: %%b >> %LOGFILE%
)
goto :EOF

:GETFOLDERINFO2
set "size=0"
set target=!full_path!
for /f "tokens=3,5" %%a in ('
    dir /a /s /w /-c "%target%"
    ^| findstr /b /l /c:"  "
    ') do if "%%b"=="" set "size=%%a"
echo %size%
set totalsize=%size%
goto :EOF

:GETFOLDERINFO4
pushd "%full_path%" || goto :EOF
setlocal

for /f "tokens=1-10,* delims= " %%a in ('
    robocopy %full_path% %TEMP% /S /L /BYTES /XJ /NFL /NDL /NJH ^| find "Bytes"
') do echo %full_path%: %%c
popd    
goto :EOF

:GETFOLDERINFO
set totalsize=0
dir "%full_path%" /s /a > size.txt 
REM Run VBScript that outputs size in MB which is saved
pushd %~dp0
start /b "" cscript /nologo foldersize.vbs
FOR /F "usebackq tokens=*" %%r in (`CSCRIPT "foldersize.vbs"`) DO SET totalsize=%%r
echo bla > nul
goto :EOF

:GETFOLDERINFO3
set "last=#"
set "_size="
for /f "tokens=1 delims= " %%s in ('compact /s:"%full_path%" /q ') do (
        set "_size=!last!"
        set "last=%%s"
)
set "_size=%_size:  =%"
set "_size=%_size: =%"
set "_size=%_size:.=%"
set "_size=%_size:,=%"
set "_size=%_size:      =%"
echo folder size is : %_size% bytes
set totalsize=%_size%
goto :EOF


:CLEANUP

DEL /Q /S projects_in_folder.tmp
DEL /Q /S size.txt
goto :EOF
Community
  • 1
  • 1
StanM
  • 827
  • 4
  • 12
  • 33
  • I don't think windows provides native way to get the folder size and in all cases an iteration is needed.Though in the second way you don't need temp and you can parse output with `FOR /F`.But I think the first one should be fastest. – npocmaka May 28 '15 at 17:42
  • Faster? I would be shocked if any script ran faster than a compiled program issuing native Win32 calls, which I imagine Sysinternals' `du` does. – Bacon Bits May 28 '15 at 17:45
  • Your option 1 is not reliable - it will fail if the size exceeds 2 gigabytes due to limitations of SET /A. – dbenham May 29 '15 at 23:11
  • @dbenham I have a vbscript that converts the data into Megabytes and then the batch script scrubs the output – StanM May 31 '15 at 01:26
  • If you are willing to use VBScript, then you can use VBScript or JScript to conveniently do the entire thing. See [my answer](http://stackoverflow.com/a/30553994/1012053) – dbenham May 31 '15 at 04:55

6 Answers6

7

You can try with (in the spirit of your second case)

@echo off
    setlocal enableextensions disabledelayedexpansion

    set "target=%~1"
    if not defined target set "target=%cd%"

    set "size=0"
    for /f "tokens=3,5" %%a in ('
        dir /a /s /w /-c "%target%"
        ^| findstr /b /l /c:"  "
    ') do if "%%b"=="" set "size=%%a"

    echo %size%
MC ND
  • 69,615
  • 8
  • 84
  • 126
  • not bad.Even looks language settings independent.I think `dir /a:d /s /w /-c` will be faster as the files are not included in the output.May be if you find a way to exclude `findstr` (i think its possible) it will be faster.Anyway +1.EDIT. `dir /a:d /s /w /-c` qill give the free space .... – npocmaka May 28 '15 at 19:47
  • @npocmaka, if files are *excluded* they are not *included* in the size calc. The `findstr` is used to discard all the `dir` lines that does not include size information. But all the intermediate sizes (one for each folder) are not necessary. What is needed is not to remove the `findstr` that limits the volume of information that `for /f` has to handle, but a way to only get the two last lines in the `dir` output, but still I don't see how to do it. – MC ND May 28 '15 at 19:55
  • @npocmaka, no, with 100k+ files (as indicated by OP), there is a lot of data to process with `for /f` (see the problem and the calcs at the end of [this answer](http://stackoverflow.com/a/30330566/2861476)). It is necessary to reduce the volume of data to process from command output. In the indicated scenario, the only way to remove the `findstr` is to use a temporary file. And the `for /f` will have to process all the lines in the output of `dir`. For cases like this the pipe is a faster solution. – MC ND May 28 '15 at 20:44
  • @npocmaka, and, having to read such a disk structure, the cost of the `findstr` command is negligible. – MC ND May 28 '15 at 20:46
  • 3
    OMG - I never realized you could split a long FOR /F command between single quotes across multiple lines like that! This feature makes me very happy :-) Thanks for demonstrating. – dbenham May 28 '15 at 21:40
6

After some testing and comparing the performance of

dir /s
compact /s
and powershell GetChild-Item

I found that using robocopy is much faster. One additional advantage is that even very long paths do not cause an error (> 256 characters in path), for instance in deeply nested folders.
And if you prefer to not count data behind junctions that can easily be included with robocopy like this:

@echo off
pushd "%~1" || goto :EOF

for /f "tokens=2 delims= " %%a in ('
    robocopy "%CD%" "%TEMP%" /S /L /BYTES /XJ /NFL /NDL /NJH /R:0 ^| find "Bytes"
') do echo %CD%: %%a
popd

If you leave out the /BYTES option you'll get the size value formatted in MB or GB. One would have to print the dimension (k,m,g,t denoting kilo, mega, giga, tera) as well in this case, using another loop variable:

@echo off
setlocal ENABLEDELAYEDEXPANSION

pushd "%~1" || goto :EOF
set "folder=%CD%"
if NOT "%folder: =%"=="%folder%" set folder="%folder%"

for /f "tokens=2-3 delims= " %%a in (
    'robocopy %folder% %folder% /S /L /XJ /NFL /NDL /NJH /R:0 ^| findstr /I "Bytes"'
) do ( 
    set dim=%%b
    set "dim=!dim:k=KB!" & set "dim=!dim:m=MB!" & set "dim=!dim:g=GB!" & set "dim=!dim:t=TB!"    
    if !dim! EQU %%b set dim=B
    echo ^    %CD%: %%a !dim!
)
popd

The robocopy command here does not actually copy anything (due to the '/L' list option) but prints a summary line containing the sum of the filesizes which then is parsed. As robocopy still expects valid paths for the source and destination folders, the folder name is used twice.

The folder name may or may not contain spaces and thus eventually needs to be quoted. That is taken care of in the first lines. %%b holds either the dimension letter or a numeric value. This is tested by substitution to avoid the 32bit limit of set /A.

user1016274
  • 4,071
  • 1
  • 23
  • 19
  • I did some hacky testing and it does indeed seem this is much much faster, going to update my question with the code (which is painful because it is basically butchered version of my code that had to do a bunch of other things) – StanM Jun 03 '15 at 20:23
  • Brilliant solution and ultra fast!+1 – Andreas Jun 17 '16 at 07:54
  • In case anybody gets an error with this command, or something like invalid parameter `/BYTES` then you are probably using old/wrong version of robocopy from windows kit. Check with `where robocopy` what version you use! – Pavel P Mar 27 '17 at 08:26
  • This code isn't complete. What is CD and TEMP? – Carlos Apr 27 '22 at 11:11
  • %CD% is an environment variable containing the current path. We change into the given path by using `pushd` and use the full path returned in %CD%. %TEMP% is another env variable which now is no longer needed. – user1016274 Apr 28 '22 at 15:08
3

Since you are willing to use VBScript (based on your comment below your question), then you can simply use the FileSystemObject Folder object Size property. It reports the total size of all files within the folder, including files in all sub-folders (recursive).

The following simple JScript script prints out the size of the current folder:

var fso = new ActiveXObject("Scripting.FileSystemObject");
WScript.Echo(fso.GetFolder('.').Size);

I chose JScript instead of VBScript because it is simple to embed JScript within a batch script (though there are methods to do the same with VBScript).

Here is a simple hybrid script utility that reports the total size of any path you pass in as the first and only argument. The hybrid script makes it very convenient to call, since you don't have to specify CSCRIPT.

FolderSize.bat

@if (@X)==(@Y) @end /* Harmless hybrid line that begins a JScript comment

::FolderSize.bat  FolderPath
::
::  Print the total size of all files within FolderPath,
::  including all sub-folders, recursively.

::******** Batch Code *********
@echo off
cscript //nologo //e:jscript "%~f0" %1
exit /b

********** JScript Code *******/
var fso = new ActiveXObject("Scripting.FileSystemObject");
WScript.Echo(fso.GetFolder(WScript.Arguments.Unnamed(0)).Size);

The only limitation is you must have access to all folders (and files?) within the folder, otherwise it fails with an error message.

dbenham
  • 127,446
  • 28
  • 251
  • 390
  • You could use `Scripting.FileSystemObject` even in a powershell script (where both powerscript and cscript are present on Windows PC out of the box). But, both the FSO and the GetChildItem methods suffer from poor, `DIR`-like performance in comparison to a script calling `robocopy`. See http://blogs.technet.com/b/heyscriptingguy/archive/2013/08/03/weekend-scripter-use-powershell-to-get-folder-sizes.aspx for a FSO script, and http://www.powershelladmin.com/wiki/Get_Folder_Size_with_PowerShell,_Blazingly_Fast for a robocopy script example (although quite bloated IMHO). – user1016274 Jun 01 '15 at 09:34
  • to echo the comment above - that is what i am using in vbscript, and it was my understanding that it's the same as iterating through all files in the folder -- Set objFSO=CreateObject("Scripting.FileSystemObject") -- Set objFile=objFSO.OpenTextFile("size.txt",1) --> this is from part 2 of my question, where it basically does a 'dir' and then grabs the size – StanM Jun 01 '15 at 14:24
  • although didn't know i could do javascript inside batch (I had a problem running the above but probably just need to look into it more, that's good to know – StanM Jun 01 '15 at 14:25
1

try this:

:foldersize
@echo off
pushd "%~1"

setlocal
set "_size="
for /f "tokens=1 delims=t" %%s in ('compact /s /q ^|find " total bytes"') do (
        set "_size=%%s"
)
set "_size=%_size:  =%"
set "_size=%_size: =%"
set "_size=%_size:.=%"
set "_size=%_size:,=%"
set "_size=%_size:      =%"
echo folder size is : %_size% bytes
endlocal
popd

it accepts one argument - the folder.The compact /s /q (/q is for reporting so no changes will be applied) produces less output and there's a chance to be faster than DIR .

EDIT: a little bit optimized variants (the one is the @MC MD's one - probably the faster).The idea is to skip FIND or FINDSTR usage as they are external programs and will make the scripts slower:

:foldersize
@echo off
pushd "%~1"

setlocal enableDelayedExpansion
set "last=#"
set "_size="
for /f "tokens=1 delims= " %%s in ('compact /s /q') do (
        set "_size=!last!"
        set "last=%%s"
)


set "_size=%_size:  =%"
set "_size=%_size: =%"
set "_size=%_size:.=%"
set "_size=%_size:,=%"
set "_size=%_size:      =%"
echo folder size is : %_size% bytes
endlocal
popd

AND

@echo off
:original script by MC ND
    setlocal enableextensions enableDelayedExpansion

    set "target=%~1"
    if not defined target set "target=%cd%"

    set "size=0"
    set "last=#"
    set "pre_last=#"
    rem set "pre_pre_last=#"
    for /f "tokens=3" %%a in ('
        dir /a:-d /s /w /-c "%target%"  
    ') do  (

        set "pre_last=!last!"
        set "last=%%a"


    )
    echo !pre_last!
npocmaka
  • 55,367
  • 18
  • 148
  • 187
  • I am a bit confused as to what compact actually does, from reading up on it, I am still not sure whether it alters the compression of files / directories, or if it will simply display information. I don't want to muck around changing a lot of folders en masse – StanM May 28 '15 at 19:51
  • 1
    @StanM - `/q` option is for reporting only so there will be no changes. – npocmaka May 28 '15 at 19:54
  • 1
    @npocmaka: `/q` option only suppresses some output. `compress` will list only if you leave out any arguments (like *.*). – user1016274 May 29 '15 at 18:55
0

I think that looping over each line of output of the compact or dir command is inefficient and can be avoided by filtering the interim result:

@echo off
REM dirsize.cmd 2015-05-29

pushd "%~1" || goto :EOF
setlocal
for /f "tokens=1-3*" %%A in ('compact /s /a /q ^| find "Datenbytes" ^| find /v "Auflistung"') do echo %CD%: %%A %%B %%C
popd

Changes:
- the script will terminate if the given path does not exist rather than scanning the current directory - compact /a is used to include hidden and system files as well - the complete output is piped into a find. This is where a locale dependent search string is needed, to filter out the summary line. In German it's "Datenbytes" but this may as well be included in a foldername. Thus, a second negative filter will suppress these. Again, locale dependent (but independence was not called for).
The advantage is that find will discard output lines faster than a shell loop with variable assignments. The cost of calling it is neglegible.

Please note that compact /q will not stop the compression action. It will only shorten the output. Not supplying any arguments in the call to compress will make it list only and not compact files/folders.

edit: Though these points are all valid IMHO, see my other answer for a much faster way.

user1016274
  • 4,071
  • 1
  • 23
  • 19
0

If you're not opposed to using PowerShell, Here's a quick script you can use:

param([String]$path=".")
Get-ChildItem $path | Measure-Object -property length -sum
Icemanind
  • 47,519
  • 50
  • 171
  • 296
  • Add '-Force' to GCI to include hidden files. Will choke on paths too deep nested (> 256 chars length). And it's as slow as `DIR`. – user1016274 May 29 '15 at 18:57