2

Possible Duplicate:
Get Size of file on disk

Is there a way to retrieve the actual number of bytes used for a particular file on disk, using C# / windows?

My application implements "watch" folders, similar to FileSystemWatcher. Some watch folders exist on shared storage (both network and fibre channel SAN), others on locally attached storage.

Files are copied into a watch folder by processes that are completely out of my control, sized anywhere from 1 GB to > 500 GB. Because of the nature of shared file systems, hoping for an exception when opening files "exclusively" (FileMode.Open, FileAccess.Read, FileShare.None) doesn't work either.

These watch folders are not supposed to take action until a file is completely copied/closed - otherwise problems downstream can occur.

To get the "real" file size used on disk I have tried:

  • System.IO.FileInfo
  • GetFileSizeEx (kernel32 p/invoke)
  • FindFirstFileEx (kernel32 p/invoke)
  • GetCompressedFileSize & GetDiskFreeSpace (kernel32 p/invoke)

Any suggestions would be very much appreciated. It seems like I'm dealing with a limitation of the Windows OS?

Community
  • 1
  • 1
  • It depends on the OS - See http://stackoverflow.com/questions/3750590/get-size-of-file-on-disk – ehambright Dec 19 '12 at 19:07
  • @ehambright - I only posted a question because I did not find another solution online. I have come across the above post several times, and there isn't anything there that I haven't tried or didn't refer to in my question (except for using WMI to return cluster size which does not help, as it still comes back to using GetCompressedFileSize). Sorry to be blunt, windows is becoming quite frustrating! ;-) – James Heliker Dec 19 '12 at 19:16
  • I don't understand why you are talking about watch folders, exclusive mode etc. What has that got to do with size of the file? – David Heffernan Dec 19 '12 at 20:27
  • Can you elaborate on the difference between "bytes used on disk" and "file size"? `GetCompressedFileSize` tells you the number of physical bits on the hard drive consumed by the file (after compression and after removing sparse sectors). From your description, it sounds like you want "the offset of the highest byte actually written to the file." There is no way to get that value. – Raymond Chen Dec 19 '12 at 21:14
  • @RaymondChen: a disk is divided into fixed-sized sectors and clusters. A file consumes so many sectors. If the end of the file only fills a portion of a sector, the whole sector is still allocated. If that sector only fills a portion of a cluster, the whole cluster is still allocated. So "bytes used" can be larger than "file size". To get the "bytes used", you have to round up the "file size" to the next cluster boundary. You have to ask the file system what its sector and cluster sizes are, such as via `GetDiskFreeSpace()`, as they are dynamically set when the file system is first created. – Remy Lebeau Dec 19 '12 at 21:31
  • @RaymondChen: thanks for answering this. I am indeed looking for "the offset of the highest byte actually written to the file." - if there is no way to do get this, I guess I'm out of luck with Windows. Remy, my objective is not to find out how much space has been allocated for a file, but rather how much of the file is actually written on disk. – James Heliker Dec 19 '12 at 22:12
  • So, you're watching for files to be written to a directory, and your problem is that you don't know how to detect when the entire file has been written. The idea you seem to be pursuing right now is the notion that *space* for the file has been allocated, but the space hasn't actually been *written* yet, so if you can detect how much of the allocated space has been used, you can then detect when it's *all* been used, and conclude that the file is complete. But that's not how file-writing works. *If* space is pre-allocated, then *all* that space is "the file," and some merely gets overwritten. – Rob Kennedy Dec 20 '12 at 00:11

3 Answers3

2

There's no API to get the "size on disk" of a file, you have to calculate it yourself based on your knowledge of the disk geometry.

Jonathan Potter
  • 36,172
  • 4
  • 64
  • 79
1

To summarize, the issue here is that the other applications copy the file by creating the destination, then using SetFileSize to set the file size to the final size (let's say 1GB), essentially creating a 1GB file filled completely with zeroes.

The applications then seek back to the beginning of the file and start filling it sequentially with data. The goal is to determine what point in the "filling it with data" the other applications have reached.

The unfortunate answer is that there is no way to obtain this information. As far as the file system is concerned, the file exists and its size is 1GB. it so happens that the file is being actively updated without changing the size, but those internal changes are not audited in file metadata. There is no "last modified time" for individual bytes of a file.

If you have domain-specific knowledge of the files (e.g. you know that the last byte is never zero) you can poll the last byte of the file and back off if it is zero.

Raymond Chen
  • 44,448
  • 11
  • 96
  • 135
0

Using GetDiskFreeSpace you can get enough information to calculate the cluster size. Round up to this value to get the actual size on disk, or a good approximation of it.

Jens Björnhager
  • 5,632
  • 3
  • 27
  • 47