3

I'm working with a legacy system that has numerous imports from external systems, most of which function by downloading a file (of varying sizes depending on context), processing it and then storing the file elsewhere on a SAN volume (formatted as NTFS and mounted on a WS2008R2 box). The problem we're having is that the sheer volume of little files ends up wasting large amounts of disk space due to the cluster size.

Ideally we'd locate the worst offending import processes and put in place some automated archiving on the files into .zip files or something similar. Building a report on this should be a relatively simple problem, but I'm struggling to get an accurate "size on disk" (as seen in Explorer). (Yes we could just archive everything after X days, but it's not ideal and doesn't necessarily help tune import processes that could be adapted somewhat to avoid the issue)

I've seen answers like: How to get the actual size-on-disk of a file from PowerShell? but whilst they work well for dealing with compressed folders, I just get the same value as the file length for short files and so underestimate true disk usage.

The files on the volume vary from some small enough to fit into the MFT records, some which only occupy a small percentage of a cluster and others that are very large. NTFS Compression isn't enabled anywhere on the volume, though a solution which could accommodate that would be more future-proof as we may enable it in future. The volume is normally accessed via a UNC share so if it's possible to determine usage via the share (Explorer seems able to) that would be great, but it's not essential as the script can always run on the server itself and access the drive directly.

Community
  • 1
  • 1
user3437708
  • 41
  • 1
  • 5
  • I'm not sure if this is what you're looking for, but have you thought about grabbing the bytes-per-sector value from WMI? `Get-WmiObject -Class Win32_DiskDrive | Select-Object -Property Caption,BytesPerSector` –  Mar 19 '14 at 13:55

2 Answers2

4

You need a little P/invoke:

add-type -type  @'
using System;
using System.Runtime.InteropServices;
using System.ComponentModel;
using System.IO;

namespace Win32Functions
{
  public class ExtendedFileInfo
  {    
    public static long GetFileSizeOnDisk(string file)
    {
        FileInfo info = new FileInfo(file);
        uint dummy, sectorsPerCluster, bytesPerSector;
        int result = GetDiskFreeSpaceW(info.Directory.Root.FullName, out sectorsPerCluster, out bytesPerSector, out dummy, out dummy);
        if (result == 0) throw new Win32Exception();
        uint clusterSize = sectorsPerCluster * bytesPerSector;
        uint hosize;
        uint losize = GetCompressedFileSizeW(file, out hosize);
        long size;
        size = (long)hosize << 32 | losize;
        return ((size + clusterSize - 1) / clusterSize) * clusterSize;
    }

    [DllImport("kernel32.dll")]
    static extern uint GetCompressedFileSizeW([In, MarshalAs(UnmanagedType.LPWStr)] string lpFileName,
       [Out, MarshalAs(UnmanagedType.U4)] out uint lpFileSizeHigh);

    [DllImport("kernel32.dll", SetLastError = true, PreserveSig = true)]
    static extern int GetDiskFreeSpaceW([In, MarshalAs(UnmanagedType.LPWStr)] string lpRootPathName,
       out uint lpSectorsPerCluster, out uint lpBytesPerSector, out uint lpNumberOfFreeClusters,
       out uint lpTotalNumberOfClusters);  
  }
}
'@

Use like this:

[Win32Functions.ExtendedFileInfo]::GetFileSizeOnDisk( 'C:\ps\examplefile.exe' )
59580416

it returns the 'size on disk' that you read in properties file from explore.

CB.
  • 58,865
  • 9
  • 159
  • 159
  • It's not entirely accurate I notice, for really small files (the ones NTFS stores in the MFT record) it over-estimates and gives a bigger number than Explorer (4K). However it's probably "good enough" for the intended purpose. – user3437708 Mar 19 '14 at 14:24
  • @user3437708 "gives a bigger number than Explorer", you mean bigger than 4k? or every time it returns 4k? Are you on a 'largeFRS' formatted volume? – CB. Mar 19 '14 at 15:17
  • Apologies, I mean it returns 4Kb, whereas Explorer returns 0 bytes since the file is being stored in the NTFS metadata rather than an actual distinct cluster. You can see it yourself if you create a file that's only a few bytes long (the exact length that fits into the NTFS metadata doesn't seem to be deterministic AFAIK) – user3437708 Mar 19 '14 at 15:35
  • 1
    @user3437708 $MFT record can be 1024 bytes in length for 'standard' formatted volume or 4096 bytes using largeFRS ( starting from windows 8 / server 2012 ). The snipped code in my answer I'm sure not considering $mft record size but just the disk cluster size, this explain the 4k for mini-sized file. – CB. Mar 19 '14 at 15:39
0

With the answer above (by CB), I found the returned size was always either 4127 (obviously based on my Cluster Size - 4096) above the correct size on disk or 4127 above the actual size. In the case of it being above the actual size, the files I've tested are either 0 bytes on disk or the size on disk is bigger than the actual size.

I also found that files above UInteger.MaxValue (4294967295) have incorrect sizes, which I also worked out how to get accurately in the code below. This required me to up the variable sizes (UInt32 and Int64 to Double). Note that I've used an arithmetic way of calculating the final size, but see the comments for a bitwise way.

I used the following code to get the most accurate answer, if it's incorrect the returned size will be exactly the same as the actual size, which will happen if the file is 0 bytes on disk or if the size on disk is bigger:

using System;
using System.Runtime.InteropServices;

public class ExtendedFileInfo
{
    public static double GetFileSizeOnDisk(string file)
    {
        uint hosize;
        uint losize = GetCompressedFileSizeW(file, out hosize);
        double size = (uint.MaxValue + 1L) * hosize + losize;
        return size;
    }
    
    [DllImport("kernel32.dll")]
    static extern uint GetCompressedFileSizeW(
        [In, MarshalAs(UnmanagedType.LPWStr)] string lpFileName,
        [Out, MarshalAs(UnmanagedType.U4)] out uint lpFileSizeHigh);
}

And the VB.Net version:

Imports System
Imports System.Runtime.InteropServices

Public Class ExtendedFileInfo
    Public Shared Function GetFileSizeOnDisk(file As String) As Double
        Dim hosize As UInteger
        Dim losize As UInteger = GetCompressedFileSizeW(file, hosize)
        Dim size As Double = (UInteger.MaxValue + 1) * hosize + losize
        Return size
    End Function

    <DllImport("kernel32.dll")> _
    Private Shared Function GetCompressedFileSizeW(
        <[In], MarshalAs(UnmanagedType.LPWStr)> lpFileName As String,
        <Out, MarshalAs(UnmanagedType.U4)> ByRef lpFileSizeHigh As UInteger) As UInteger
    End Function
End Class
Walkman
  • 67
  • 1
  • 2
  • 8
  • None of these handle hosize correctly. You're converting to double too late, after the data type has already overflowed. You want `((ulong)hosize << 32) | losize` (in C#, not sure how to write that in VB but it should use left-bitshift and bitwise-OR operations there too) – Ben Voigt Apr 16 '21 at 20:28
  • My point is that in your code `uint.MaxValue + 1` overflows to zero (since the common type of both operands is `uint`, the result of the addtition is a `uint` which is too small to hold the result) You then compute `hosize * 0 + losize` which is not at all what is wanted. I chose to reassemble using bitwise operations simply because that's how `GetCompressedFileSizeW` is defined... low 32 bits here, high 32 bits there, but my point was about the order of casting and operations. – Ben Voigt Apr 23 '21 at 16:42
  • I was going to write a comment about how this does work perfectly (I'm using the VB version and it has always given correct values), but I tested the C# version and it doesn't even compile, so exactly what you said - the convert to double is too late. I used the CodeConverter extension by IC#Code (what I use nowadays), and the difference is it made the calculation `uint.MaxValue + 1L` - so now it uses the `long` `+` operator: https://i.imgur.com/j7NNGNn.png – Walkman Apr 23 '21 at 17:26
  • with this, I've tested both ways, and they do work perfectly: https://i.imgur.com/kGJf9FQ.png - I prefer the mathematical way, as it's easier to understand, don't need to know bitwise operators. I am going to update my answer to include this, should actually work now – Walkman Apr 23 '21 at 17:26
  • @BenVoigt yes sorry, hit enter too soon - I forgot I have to use Shift + Enter to insert a line break (which I now see doesn't even work) – Walkman Apr 23 '21 at 17:27
  • I'm not sure how you knew what to multiply by without starting from "upper 32 bits" and "lower 32 bits" but my main point wasn't to use `<< 32` instead of `* something` where something is `1L << 32` no matter how you calculate it, it was about the data types. It seems that VB.NET is much more aggressive about promoting things to larger types. – Ben Voigt Apr 23 '21 at 18:21