I need to calculate the size of hundreds of folders, some will be 10MB some maybe 10GB, I need a super fast way of getting the size of each folder using C#.
My end result will hopefully be:
Folder1 10.5GB
Folder2 230MB
Folder3 1.2GB
...
I need to calculate the size of hundreds of folders, some will be 10MB some maybe 10GB, I need a super fast way of getting the size of each folder using C#.
My end result will hopefully be:
Folder1 10.5GB
Folder2 230MB
Folder3 1.2GB
...
Add a reference to the Microsoft Scripting Runtime and use:
Scripting.FileSystemObject fso = new Scripting.FileSystemObject();
Scripting.Folder folder = fso.GetFolder([folder path]);
Int64 dirSize = (Int64)folder.Size;
If you just need the size, this is much faster than recursing.
OK, this is terrible, but...
Use a recursive dos batch file called dirsize.bat:
@ECHO OFF
IF %1x==x GOTO start
IF %1x==DODIRx GOTO dodir
SET CURDIR=%1
FOR /F "usebackq delims=" %%A IN (`%0 DODIR`) DO SET ANSWER=%%A %CURDIR%
ECHO %ANSWER%
GOTO end
:start
FOR /D %%D IN (*.*) DO CALL %0 "%%D"
GOTO end
:dodir
DIR /S/-C %CURDIR% | FIND "File(s)"
GOTO end
:end
Note: There should be a tab character after the final "%%A" on line 5, not spaces.
This is the data you're looking for. It will do thousands of files fairly quickly. In fact, it does my entire harddrive in less than 2 seconds.
Execute the file like this dirsize | sort /R /+25
in order to see the largest directory listed first.
Good luck.
The fastest approach on 4.0-4.5 framework which I could find to calculate files size and their count on disk was:
using System.IO;
using System.Threading;
using System.Threading.Tasks;
class FileCounter
{
private readonly int _clusterSize;
private long _filesCount;
private long _size;
private long _diskSize;
public void Count(string rootPath)
{
// Enumerate files (without real execution of course)
var filesEnumerated = new DirectoryInfo(rootPath)
.EnumerateFiles("*", SearchOption.AllDirectories);
// Do in parallel
Parallel.ForEach(filesEnumerated, GetFileSize);
}
/// <summary>
/// Get real file size and add to total
/// </summary>
/// <param name="fileInfo">File information</param>
private void GetFileSize(FileInfo fileInfo)
{
Interlocked.Increment(ref _filesCount);
Interlocked.Add(ref _size, fileInfo.Length);
}
}
var fcount = new FileCounter("F:\\temp");
fcount.Count();
This approach appeared for me as the best which I could find on .net platform. Btw if you need to calculate cluster size and real size on disk, you can do next:
using System.Runtime.InteropServices;
private long WrapToClusterSize(long originalSize)
{
return ((originalSize + _clusterSize - 1) / _clusterSize) * _clusterSize;
}
private static int GetClusterSize(string rootPath)
{
int sectorsPerCluster = 0, bytesPerSector = 0, numFreeClusters = 0, totalNumClusters = 0;
if (!GetDiskFreeSpace(rootPath, ref sectorsPerCluster, ref bytesPerSector, ref numFreeClusters,
ref totalNumClusters))
{
// Satisfies rule CallGetLastErrorImmediatelyAfterPInvoke.
// see http://msdn.microsoft.com/en-us/library/ms182199(v=vs.80).aspx
var lastError = Marshal.GetLastWin32Error();
throw new Exception(string.Format("Error code {0}", lastError));
}
return sectorsPerCluster * bytesPerSector;
}
[DllImport(Kernel32DllImport, SetLastError = true)]
private static extern bool GetDiskFreeSpace(
string rootPath,
ref int sectorsPerCluster,
ref int bytesPerSector,
ref int numFreeClusters,
ref int totalNumClusters);
And of course you need to rewrite GetFileSize() in first code section:
private long _diskSize;
private void GetFileSize(FileInfo fileInfo)
{
Interlocked.Increment(ref _filesCount);
Interlocked.Add(ref _size, fileInfo.Length);
Interlocked.Add(ref _diskSize, WrapToClusterSize(fileInfo.Length));
}
There is no simple way to do this in .Net; you will have to loop through every file and subdir. See the examples here to see how it's done.
You can do something like this, but there's no fast=true setting when it comes to getting folder sizes, you have to add up the file sizes.
private static IDictionary<string, long> folderSizes;
public static long GetDirectorySize(string dirName)
{
// use memoization to keep from doing unnecessary work
if (folderSizes.ContainsKey(dirName))
{
return folderSizes[dirName];
}
string[] a = Directory.GetFiles(dirName, "*.*");
long b = 0;
foreach (string name in a)
{
FileInfo info = new FileInfo(name);
b += info.Length;
}
// recurse on all the directories in current directory
foreach (string d in Directory.GetDirectories(dirName))
{
b += GetDirectorySize(d);
}
folderSizes[dirName] = b;
return b;
}
static void Main(string[] args)
{
folderSizes = new Dictionary<string, long>();
GetDirectorySize(@"c:\StartingFolder");
foreach (string key in folderSizes.Keys)
{
Console.WriteLine("dirName = " + key + " dirSize = " + folderSizes[key]);
}
// now folderSizes will contain a key for each directory (starting
// at c:\StartingFolder and including all subdirectories), and
// the dictionary value will be the folder size
}
If you right click a large directory then properties you can see that it takes significant amount of time to calculate the size... I don't think we can beat MS in this. One thing you could do is index the sizes of directories/subdirs, if you are going to calculate them over and over again... that would significantly increase the speed.
You could use something like this to calculate directory size in C# recursively
static long DirSize(DirectoryInfo directory)
{
long size = 0;
FileInfo[] files = directory.GetFiles();
foreach (FileInfo file in files)
{
size += file.Length;
}
DirectoryInfo[] dirs = directory.GetDirectories();
foreach (DirectoryInfo dir in dirs)
{
size += DirSize(dir);
}
return size;
}
Dot Net Pearls has a method similar to the ones described here. It's surprising that the System.IO.DirectoryInfo class doesn't have a method to do this since it seems like a common need and it probably would be faster to do it without doing a native/managed transition on each file system object. I do think that if speed is the key thing, writing a non-managed object to do this calculation and then call it once per directory from managed code.
There are some leads in this link (though it's in Python) from a person running into similar performance issues. You can try calling down into Win32 API to see if performance improves, but at the end you're going to run into the same issue: a task can only be done so quickly and if you have to do the task a lot of times, it will take a lot of time. Can you give more detail on what you're doing this for? It might help folks come up with a heuristic or some cheats to help you. If you're doing this calculation a lot, are you caching the results?
I'm quite sure that this will be slow as hell, but I'd write it like this:
using System.IO;
long GetDirSize(string dir) {
return new DirectoryInfo(dir)
.GetFiles("", SearchOption.AllDirectories)
.Sum(p => p.Length);
}