I'm working with a legacy system that has numerous imports from external systems, most of which function by downloading a file (of varying sizes depending on context), processing it and then storing the file elsewhere on a SAN volume (formatted as NTFS and mounted on a WS2008R2 box). The problem we're having is that the sheer volume of little files ends up wasting large amounts of disk space due to the cluster size.
Ideally we'd locate the worst offending import processes and put in place some automated archiving on the files into .zip files or something similar. Building a report on this should be a relatively simple problem, but I'm struggling to get an accurate "size on disk" (as seen in Explorer). (Yes we could just archive everything after X days, but it's not ideal and doesn't necessarily help tune import processes that could be adapted somewhat to avoid the issue)
I've seen answers like: How to get the actual size-on-disk of a file from PowerShell? but whilst they work well for dealing with compressed folders, I just get the same value as the file length for short files and so underestimate true disk usage.
The files on the volume vary from some small enough to fit into the MFT records, some which only occupy a small percentage of a cluster and others that are very large. NTFS Compression isn't enabled anywhere on the volume, though a solution which could accommodate that would be more future-proof as we may enable it in future. The volume is normally accessed via a UNC share so if it's possible to determine usage via the share (Explorer seems able to) that would be great, but it's not essential as the script can always run on the server itself and access the drive directly.