18

How do I get the actual filesize on disk in python? (the actual size it takes on the harddrive).

Aran-Fey
  • 39,665
  • 11
  • 104
  • 149
Alex
  • 823
  • 2
  • 7
  • 6
  • You mean rouned up by cluster size? – ruslik Nov 25 '10 at 08:18
  • 2
    Take a look at this question: http://stackoverflow.com/questions/2493172/determine-cluster-size-of-file-system-in-python – Ruel Nov 25 '10 at 08:24
  • @ruslik: It's not that simple. Consider e.g. sparse or compressed files, which can take less space than their size indicates. – Philipp Nov 25 '10 at 09:13

7 Answers7

18

UNIX only:

import os
from collections import namedtuple

_ntuple_diskusage = namedtuple('usage', 'total used free')

def disk_usage(path):
    """Return disk usage statistics about the given path.

    Returned valus is a named tuple with attributes 'total', 'used' and
    'free', which are the amount of total, used and free space, in bytes.
    """
    st = os.statvfs(path)
    free = st.f_bavail * st.f_frsize
    total = st.f_blocks * st.f_frsize
    used = (st.f_blocks - st.f_bfree) * st.f_frsize
    return _ntuple_diskusage(total, used, free)

Usage:

>>> disk_usage('/')
usage(total=21378641920, used=7650934784, free=12641718272)
>>>

Edit 1 - also for Windows: https://code.activestate.com/recipes/577972-disk-usage/?in=user-4178764

Edit 2 - this is also available in Python 3.3+: https://docs.python.org/3/library/shutil.html#shutil.disk_usage

Giampaolo Rodolà
  • 12,488
  • 6
  • 68
  • 60
7

Here is the correct way to get a file's size on disk, on platforms where st_blocks is set:

import os

def size_on_disk(path):
    st = os.stat(path)
    return st.st_blocks * 512

Other answers that indicate to multiply by os.stat(path).st_blksize or os.vfsstat(path).f_bsize are simply incorrect.

The Python documentation for os.stat_result.st_blocks very clearly states:

st_blocks
Number of 512-byte blocks allocated for file. This may be smaller than st_size/512 when the file has holes.

Furthermore, the stat(2) man page says the same thing:

blkcnt_t  st_blocks;      /* Number of 512B blocks allocated */
Jonathon Reinhart
  • 132,704
  • 33
  • 254
  • 328
5

Update 2021-03-26: Previously, my answer rounded the logical size of the file up to be an integer multiple of the block size. This approach only works if the file is stored in a continuous sequence of blocks on disk (or if all the blocks are full except for one). Since this is a special case (though common for small files), I have updated my answer to make it more generally correct. However, note that unfortunately the statvfs method and the st_blocks value may not be available on some system (e.g., Windows 10).

Call os.stat(filename).st_blocks to get the number of blocks in the file.

Call os.statvfs(filename).f_bsize to get the filesystem block size.

Then compute the correct size on disk, as follows:

num_blocks = os.stat(filename).st_blocks
block_size = os.statvfs(filename).f_bsize
sizeOnDisk = num_blocks*block_size
hft
  • 1,245
  • 10
  • 29
  • 4
    `((lSize-1)/bSize+1)*bSize)` might be slightly more accurate. Thanks for correcting my ancient and wrong answer. – ephemient Jan 20 '15 at 16:19
  • `Deprecated since version 2.6: The statvfs module has been removed in Python 3.` :-( https://docs.python.org/2/library/statvfs.html – danodonovan Aug 12 '15 at 12:36
  • @danodonovan It looks like the `statvfs` module has been removed in Python 3, but the answer uses the `os` module. As you can see, the [documentation for Python 3](https://docs.python.org/3/library/os.html#os.statvfs) reveals that `os.statvfs` is still around and has even been updated to include new functionality as recently as Python 3.6. – bytesized Jan 10 '17 at 22:11
  • I am having a situation with larger files where both of your formulae are giving me a value that is 1 block (4,096 bytes) smaller than what du gives me. For example, if you create a file using the command `dd if=/dev/zero of=testsize bs=1 count=419472426`. Said another way, the difference between du's results using the --apparent-size option is off by 7,126 instead of 4,096. Note: the value from du's --apparent-size option does match the value obtained using `os.stat(filename).st_size`. – user1748155 Jul 12 '18 at 05:05
  • According to POSIX – https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_stat.h.html – "There is no correlation between values of the st_blocks and st_blksize, and the f_bsize (from ) structure members". So, unless Python is making some stronger guarantee than POSIX does here, the assumption that f_bsize returned by statvfs is the correct units for st_blocks may not always be accurate. – Simon Kissane Jan 28 '23 at 03:30
2
st = os.stat(…)
du = st.st_blocks * st.st_blksize
ephemient
  • 198,619
  • 38
  • 280
  • 391
  • +1, didn't realise this was in `os.stat`! I was about to refer the questioner to [`win32file.DeviceIoControl`](http://docs.activestate.com/activepython/2.5/pywin32/win32file__DeviceIoControl_meth.html). Don't know why I assumed the OP was on Windows :P – fmark Nov 25 '10 at 08:30
  • "On some Unix systems (such as Linux), the following attributes may also be available: st_blocks (number of blocks allocated for file), st_blksize (filesystem blocksize)..." – i.e. that's not portable, and you should at least catch the exception that is raised when these members aren't available. – Philipp Nov 25 '10 at 09:15
  • 12
    Careful, this is wrong! On Linux, `st.st_blocks` is *always* in units of 512 bytes, while `st.st_blksize` is a filesystem blocksize (typically 4096 bytes). The real usage is `st.st_blocks * 512`. See http://linux.die.net/man/2/stat for details. – Jim Paris Aug 05 '13 at 16:24
  • 1
    No, you're both wrong: st.st_blocks is NOT ALWAYS in units of 512 bytes. On my machine it is in units of 1024 (which is strange indeed). Additionally, the answer is wrong because st_blksize does not return 1024, it returns the FILE I/O block size, e.g., st_blksize returns 65536 on my machine. For example, on my dell laptop running python 2.7.8 on cygwin on Windows 7, I created a 3000Byte files ("dd if=/dev/zero bs=3000 count=1 of=./testfile.txt") and: os.stat("testfile.txt").st_blocks=4; os.stat("./testfile.txt").st_blksize=65536; the logical size is 3000, on disk is 4096. I will answer below – hft Jan 17 '15 at 23:06
  • Can you please update your answer to refer to @hft's answer below? – Miserable Variable Apr 25 '18 at 18:37
2

Practically 12 years and no answer on how to do this in windows...

Here's how to find the 'Size on disk' in windows via ctypes;

import ctypes
def GetSizeOnDisk(path):
    '''https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getcompressedfilesizew'''
    filesizehigh = ctypes.c_ulonglong(0) # not sure about this... something about files >4gb
    return ctypes.windll.kernel32.GetCompressedFileSizeW(ctypes.c_wchar_p(path),ctypes.pointer(filesizehigh))

'''
>>> os.stat(somecompressedorofflinefile).st_size
943141
>>> GetSizeOnDisk(somecompressedorofflinefile)
671744
>>>
'''
J_K
  • 51
  • 2
  • Thankyou! I was looking all over for this. Curiously, when OneDrive shows the status of a file as "Available when online" your function almost always returns a size of zero, which is what I want. But for some strange reason it sometimes shows the full size, even when the file is available only when online. No idea why. – Michael Aug 10 '22 at 03:43
0

I'm not certain if this is size on disk, or the logical size:

import os
filename = "/home/tzhx/stuff.wev"
size = os.path.getsize(filename)

If it's not the droid your looking for, you can round it up by dividing by cluster size (as float), then using ceil, then multiplying.

TZHX
  • 5,291
  • 15
  • 47
  • 56
  • when I used getsize() in windows7,python 2.2, I did get the actual space file occupies. In my case, I crave for the just "file size" not "file space".I wonder how can you get just the file size – Allan Ruin Aug 02 '12 at 17:43
0

To get the disk usage for a given file/folder, you can do the following:

import os

def disk_usage(path):
    """Return cumulative number of bytes for a given path."""
    # get total usage of current path
    total = os.path.getsize(path)
    # if path is dir, collect children
    if os.path.isdir(path):
        for file_name in os.listdir(path):
            child = os.path.join(path, file_name)
            # recursively get byte use for children
            total += disk_usage(child)
    return total

The function recursively collects byte usage for files nested within a given path, and returns the cumulative use for the entire path. You could also add a print "{path}: {bytes}".format(path, total) in there if you want the information for each file to print.

Jared Wilber
  • 6,038
  • 1
  • 32
  • 35