338

A function to return human readable size from bytes size:

>>> human_readable(2048)
'2 kilobytes'
>>>

How to do this?

Chris_Rands
  • 38,994
  • 14
  • 83
  • 119
Sridhar Ratnakumar
  • 81,433
  • 63
  • 146
  • 187
  • 2
    I think this falls under the heading of "too small a task to require a library". If you look at the source for hurry.filesize, there's only a single function, with a dozen lines of code. And even that could be compacted. – Ben Blank Jul 07 '09 at 21:09
  • 17
    The advantage of using a library is that it is usually tested (contains tests that can be run in case if one's edit introduces a bug). If you add the tests, then it is not anymore 'dozen lines of code' :-) – Sridhar Ratnakumar Jul 07 '09 at 21:49
  • 1
    The amount of re-inventing the wheel in python community is crazy and ridiculous. Just ls -h /path/to/file.ext will do the job. Having said that, the accepted answer is doing a good job. Kudo. – Edward Aung Apr 14 '19 at 23:49
  • 2
    [2048 bytes](http://www.lonniebest.com/DataUnitConverter/#2048B) = [2 kibibytes](http://www.lonniebest.com/DataUnitConverter/#2KiB) (not kilobytes). – Lonnie Best Oct 25 '21 at 04:03

27 Answers27

732

Addressing the above "too small a task to require a library" issue by a straightforward implementation (using f-strings, so Python 3.6+):

def sizeof_fmt(num, suffix="B"):
    for unit in ("", "Ki", "Mi", "Gi", "Ti", "Pi", "Ei", "Zi"):
        if abs(num) < 1024.0:
            return f"{num:3.1f}{unit}{suffix}"
        num /= 1024.0
    return f"{num:.1f}Yi{suffix}"

Supports:

  • all currently known binary prefixes
  • negative and positive numbers
  • numbers larger than 1000 Yobibytes
  • arbitrary units (maybe you like to count in Gibibits!)

Example:

>>> sizeof_fmt(168963795964)
'157.4GiB'

by Fred Cirera

ideasman42
  • 42,413
  • 44
  • 197
  • 320
Sridhar Ratnakumar
  • 81,433
  • 63
  • 146
  • 187
  • I thought `num /= 1024.0` style was discouraged in Python... I am surprised it's even legal... – markvgti Feb 01 '14 at 07:58
  • Isn't %3.1f kind of useless? The string will always have a length of 3 and more, right? – Matt3o12 Aug 05 '14 at 20:17
  • 9
    There should be a space between the number and the unit. If you are outputting html or latex it should be a non-breaking-space. – josch Nov 21 '14 at 14:57
  • 5
    just a thought, but for any(?) suffix other than `B` (i.e. for units other than bytes) you'd want the factor to be `1000.0` rather than `1024.0` no? – Anentropic Dec 23 '14 at 10:15
  • another nitpick: kilo, mega, etc. are called [metric prefixes](https://en.wikipedia.org/wiki/Metric_prefix) whereas **B**ytes is a unit. – Harvey Jan 27 '15 at 13:01
  • 7
    If you want to increase the precision of the decimal component,change the `1` on lines 4 and 6 to whatever precision you want. – Matthew G May 10 '15 at 04:07
  • 3
    cool! I liked it some much I converted it to Go lang : http://play.golang.org/p/68w_QCsE4F – eSniff Jun 02 '15 at 05:47
  • The last `Yi` could go in the format string, so there's one less item to substitute. – xOneca Jul 26 '15 at 16:24
  • 74
    sure would be nice if all this iteration on this "too small a task" were captured and encapsulated into a library with tests. – fess . Feb 27 '16 at 22:03
  • 1
    @fess - there are two such libraries in http://stackoverflow.com/a/15485265/1174784. also, note that the above code has been adopted by borg backup and improved for Python3: https://github.com/borgbackup/borg/blob/master/src/borg/helpers.py#L745 – anarcat Nov 10 '16 at 15:28
  • 1
    @anarcat it seems moved to https://github.com/borgbackup/borg/blob/master/src/borg/helpers/parseformat.py#L256 now. – Jeroen Wiert Pluimers Oct 14 '18 at 18:31
  • 3
    `sizeof_fmt(5)` returns `5.0B`. There shouldn't be a precision for bytes. – Mr. Clear Apr 17 '19 at 12:21
  • @anarcat thanks for the link. I created a [PR](https://github.com/borgbackup/borg/pull/5199) with further improvements. – darkdragon May 26 '20 at 17:02
  • **It's really interesting that this and all of the other answers contain a serious bug which makes the functions break completely on certain input. I don't mean to hijack this post, but I want to help people find a working, high-performance function instead. This answer explains why the others are bugged:** https://stackoverflow.com/a/63839503/8874388 **(follow the link to see the answer)** – Mitch McMabers Sep 11 '20 at 01:33
  • @MitchMcMabers calm down. the function doesn't "break completely", it just displays 1024. no biggie – jemand771 Jul 13 '21 at 08:41
  • @MitchMcMabers actually you did mean to hijack this post. your "serious bug" is a minor *cosmetic* bug at best and you could have offered a minor fix for it like so many others have. Instead you wrote in bold text and forced people to go read your answer just to find out what the bug is. -1 just for that. BTW the 1024 issue is easily fixed with a `round()` on line 3 of **this** answer. – Philip Couling Nov 24 '21 at 01:08
187

A library that has all the functionality that it seems you're looking for is humanize. humanize.naturalsize() seems to do everything you're looking for.

Example code (python 3.10)

import humanize

disk_sizes_list = [1, 100, 999, 1000,1024, 2000,2048, 3000, 9999, 10000, 2048000000, 9990000000, 9000000000000000000000]
for size in disk_sizes_list:
    natural_size = humanize.naturalsize(size)
    binary_size = humanize.naturalsize(size, binary=True)
    print(f" {natural_size} \t| {binary_size}\t|{size}")

Output

 1 Byte     | 1 Byte    |1
 100 Bytes  | 100 Bytes |100
 999 Bytes  | 999 Bytes |999
 1.0 kB     | 1000 Bytes    |1000
 1.0 kB     | 1.0 KiB   |1024
 2.0 kB     | 2.0 KiB   |2000
 2.0 kB     | 2.0 KiB   |2048
 3.0 kB     | 2.9 KiB   |3000
 10.0 kB    | 9.8 KiB   |9999
 10.0 kB    | 9.8 KiB   |10000
 2.0 GB     | 1.9 GiB   |2048000000
 10.0 GB    | 9.3 GiB   |9990000000
 9.0 ZB     | 7.6 ZiB   |9000000000000000000000
Jayan
  • 18,003
  • 15
  • 89
  • 143
Pyrocater
  • 1,982
  • 1
  • 13
  • 13
  • 23
    Some examples using the data from the OP: `humanize.naturalsize(2048) # => '2.0 kB'` , `humanize.naturalsize(2048, binary=True) # => '2.0 KiB'` `humanize.naturalsize(2048, gnu=True) # => '2.0K'` – RubenLaguna Jan 27 '16 at 10:10
48

The following works in Python 3.6+, is, in my opinion, the easiest to understand answer on here, and lets you customize the amount of decimal places used.

def human_readable_size(size, decimal_places=2):
    for unit in ['B', 'KiB', 'MiB', 'GiB', 'TiB', 'PiB']:
        if size < 1024.0 or unit == 'PiB':
            break
        size /= 1024.0
    return f"{size:.{decimal_places}f} {unit}"
hostingutilities.com
  • 8,894
  • 3
  • 41
  • 51
44

There's always got to be one of those guys. Well today it's me. Here's a one-liner -- or two lines if you count the function signature.

def human_size(bytes, units=[' bytes','KB','MB','GB','TB', 'PB', 'EB']):
    """ Returns a human readable string representation of bytes """
    return str(bytes) + units[0] if bytes < 1024 else human_size(bytes>>10, units[1:])

>>> human_size(123)
123 bytes
>>> human_size(123456789)
117GB

If you need sizes bigger than an Exabyte, it's a little bit more gnarly:

def human_size(bytes, units=[' bytes','KB','MB','GB','TB', 'PB', 'EB']):
    return str(bytes) + units[0] if bytes < 1024 else human_size(bytes>>10, units[1:]) if units[1:] else f'{bytes>>10}ZB'
Gilad Peleg
  • 2,010
  • 16
  • 29
hostingutilities.com
  • 8,894
  • 3
  • 41
  • 51
  • 2
    FYI, the output will always be rounded down. – hostingutilities.com May 03 '17 at 03:10
  • 1
    wouldn't it be better to assign the default list for units inside the method to avoid using a list as a default argument? (and using `units=None` instead) – Ima Nov 20 '17 at 16:02
  • 3
    @ImanolEizaguirre Best practices would state that it's a good idea to do as you suggested, so you don't inadvertently introduce bugs into a program. However, this function as it is written is safe because the units list is never manipulated. If it was manipulated, the changes would be permanent, and any subsequent function calls would receive a manipulated version of the list as the default argument for the units argument. – hostingutilities.com Nov 20 '17 at 22:14
  • 2
    For Python 3, if you want a decimal point, use this instead: ``` def human_size(fsize, units=[' bytes','KB','MB','GB','TB', 'PB', 'EB']): return "{:.2f}{}".format(float(fsize), units[0]) if fsize < 1024 else human_size(fsize / 1024, units[1:]) ``` – Omer Tuchfeld Jul 21 '19 at 08:13
  • @OmerTuchfeld +1 because this way the resulting size is more accurate. Also, I'm divided on whether the units should be called KiB, MiB, etc. or not. – Anchith Acharya Nov 19 '20 at 10:48
  • If the constant used is 1024 then KiB, Mib, etc is the correct naming. Otherwise if the constant is 1000 then the correct naming is KB, MB, etc – Omer Tuchfeld Nov 19 '20 at 12:59
  • 1
    avoid the gnarly ZB if...else with `... if bytes < 1024 or len(units)==1 else human_size(...` – Ed Randall Mar 04 '21 at 16:08
39

Here's my version. It does not use a for-loop. It has constant complexity, O(1), and is in theory more efficient than the answers here that use a for-loop.

from math import log
unit_list = zip(['bytes', 'kB', 'MB', 'GB', 'TB', 'PB'], [0, 0, 1, 2, 2, 2])
def sizeof_fmt(num):
    """Human friendly file size"""
    if num > 1:
        exponent = min(int(log(num, 1024)), len(unit_list) - 1)
        quotient = float(num) / 1024**exponent
        unit, num_decimals = unit_list[exponent]
        format_string = '{:.%sf} {}' % (num_decimals)
        return format_string.format(quotient, unit)
    if num == 0:
        return '0 bytes'
    if num == 1:
        return '1 byte'

To make it more clear what is going on, we can omit the code for the string formatting. Here are the lines that actually do the work:

exponent = int(log(num, 1024))
quotient = num / 1024**exponent
unit_list[exponent]
joctee
  • 2,429
  • 1
  • 23
  • 19
  • 2
    while you talk about optimizing such a short code, why not use if/elif/else? Th last check num==1 is unnecessary unless you expect negative file sizes. Otherwise: nice work, I like this version. – ted Sep 06 '12 at 06:50
  • 2
    My code could surely be more optimized. However, my point was to demonstrate that this task could be solved with constant complexity. – joctee Sep 08 '12 at 16:00
  • 46
    The answers with for loops are also O(1), because the for loops are bounded--their computation time doesn't scale with the size of the input (we don't have unbounded SI prefixes). – Thomas Minor Apr 17 '13 at 16:53
  • 1
    probably should add a comma for the formatting, so `1000` would show as `1,000 bytes`. – iTayb Jan 03 '14 at 09:14
  • 4
    Note that when using Python 3, zip returns an iterator, so you need to wrap it with list(). `unit_list = list(zip(['bytes', 'kB', 'MB', 'GB', 'TB', 'PB'], [0, 0, 1, 2, 2, 2]))` – donarb Feb 21 '18 at 21:46
  • 1
    Your code is inconsistent. It uses decimal suffixes but divides by 1024. – CodesInChaos Jun 14 '18 at 17:01
31

I recently came up with a version that avoids loops, using log2 to determine the size order which doubles as a shift and an index into the suffix list:

from math import log2

_suffixes = ['bytes', 'KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB']

def file_size(size):
    # determine binary order in steps of size 10 
    # (coerce to int, // still returns a float)
    order = int(log2(size) / 10) if size else 0
    # format file size
    # (.4g results in rounded numbers for exact matches and max 3 decimals, 
    # should never resort to exponent values)
    return '{:.4g} {}'.format(size / (1 << (order * 10)), _suffixes[order])

Could well be considered unpythonic for its readability, though.

Jules G.M.
  • 3,624
  • 1
  • 21
  • 35
akaIDIOT
  • 9,171
  • 3
  • 27
  • 30
25

If you're using Django installed you can also try filesizeformat:

from django.template.defaultfilters import filesizeformat
filesizeformat(1073741824)

=>

"1.0 GB"
Jon Tirsen
  • 4,750
  • 4
  • 29
  • 27
  • 2
    One downside to this for me is that it uses GB instead of GiB even though it's dividing by 1024. – Pepedou Jun 07 '19 at 18:23
  • Never heard of GiB before, its seems silly. Does anyone every uses memory storay in real powers of 10^3 for anything useful? No-one says `Mebibyte`, we all say `MegaByte` – run_the_race Jan 14 '22 at 14:27
  • Yes, that's quite popular nowadays. For instance, Nautilus and Finder use SI-style prefixes. – Ignat Loskutov Apr 07 '22 at 13:14
14

You should use "humanize".

>>> humanize.naturalsize(1000000)
'1.0 MB'
>>> humanize.naturalsize(1000000, binary=True)
'976.6 KiB'
>>> humanize.naturalsize(1000000, gnu=True)
'976.6K'

Reference:

https://pypi.org/project/humanize/

10

One such library is hurry.filesize.

>>> from hurry.filesize import alternative
>>> size(1, system=alternative)
'1 byte'
>>> size(10, system=alternative)
'10 bytes'
>>> size(1024, system=alternative)
'1 KB'
Sridhar Ratnakumar
  • 81,433
  • 63
  • 146
  • 187
  • 4
    However, this library is not very customizable. >>> from hurry.filesize import size >>> size(1031053) >>> size(3033053) '2M' I expect it show, for example, '2.4M' or '2423K' .. instead of the blatantly approximated '2M'. – Sridhar Ratnakumar Jul 07 '09 at 21:06
  • Note also that it's very easy to just grab the code out of hurry.filesize and put it directly in your own code, if you're dealing with dependency systems and the like. It's about as short as the snippets people are providing here. – mlissner Oct 23 '11 at 03:03
  • @SridharRatnakumar, to address the over-approximation problem somewhat intelligently, please see my mathematical [**hack**](http://pastebin.com/DNNHkpZU). Can the approach be further improved upon? – Asclepius Jan 29 '15 at 00:25
8

Using either powers of 1000 or kibibytes would be more standard-friendly:

def sizeof_fmt(num, use_kibibyte=True):
    base, suffix = [(1000.,'B'),(1024.,'iB')][use_kibibyte]
    for x in ['B'] + map(lambda x: x+suffix, list('kMGTP')):
        if -base < num < base:
            return "%3.1f %s" % (num, x)
        num /= base
    return "%3.1f %s" % (num, x)

P.S. Never trust a library that prints thousands with the K (uppercase) suffix :)

Giancarlo Sportelli
  • 1,219
  • 1
  • 17
  • 20
  • 1
    `P.S. Never trust a library that prints thousands with the K (uppercase) suffix :)` Why not? The code could be perfectly sound and the author just didn't consider the casing for kilo. It seems pretty asinine to automatically dismiss any code based on your rule... – Douglas Gaskell Feb 11 '18 at 02:21
7

This will do what you need in almost any situation, is customizable with optional arguments, and as you can see, is pretty much self-documenting:

from math import log
def pretty_size(n,pow=0,b=1024,u='B',pre=['']+[p+'i'for p in'KMGTPEZY']):
    pow,n=min(int(log(max(n*b**pow,1),b)),len(pre)-1),n*b**pow
    return "%%.%if %%s%%s"%abs(pow%(-pow-1))%(n/b**float(pow),pre[pow],u)

Example output:

>>> pretty_size(42)
'42 B'

>>> pretty_size(2015)
'2.0 KiB'

>>> pretty_size(987654321)
'941.9 MiB'

>>> pretty_size(9876543210)
'9.2 GiB'

>>> pretty_size(0.5,pow=1)
'512 B'

>>> pretty_size(0)
'0 B'

Advanced customizations:

>>> pretty_size(987654321,b=1000,u='bytes',pre=['','kilo','mega','giga'])
'987.7 megabytes'

>>> pretty_size(9876543210,b=1000,u='bytes',pre=['','kilo','mega','giga'])
'9.9 gigabytes'

This code is both Python 2 and Python 3 compatible. PEP8 compliance is an exercise for the reader. Remember, it's the output that's pretty.

Update:

If you need thousands commas, just apply the obvious extension:

def prettier_size(n,pow=0,b=1024,u='B',pre=['']+[p+'i'for p in'KMGTPEZY']):
    r,f=min(int(log(max(n*b**pow,1),b)),len(pre)-1),'{:,.%if} %s%s'
    return (f%(abs(r%(-r-1)),pre[r],u)).format(n*b**pow/b**float(r))

For example:

>>> pretty_units(987654321098765432109876543210)
'816,968.5 YiB'
gojomo
  • 52,260
  • 14
  • 86
  • 115
7

The HumanFriendly project helps with this.

import humanfriendly
humanfriendly.format_size(1024)

The above code will give 1KB as answer.
Examples can be found here.

zx485
  • 28,498
  • 28
  • 50
  • 59
arumuga abinesh
  • 111
  • 1
  • 7
6

Riffing on the snippet provided as an alternative to hurry.filesize(), here is a snippet that gives varying precision numbers based on the prefix used. It isn't as terse as some snippets, but I like the results.

def human_size(size_bytes):
    """
    format a size in bytes into a 'human' file size, e.g. bytes, KB, MB, GB, TB, PB
    Note that bytes/KB will be reported in whole numbers but MB and above will have greater precision
    e.g. 1 byte, 43 bytes, 443 KB, 4.3 MB, 4.43 GB, etc
    """
    if size_bytes == 1:
        # because I really hate unnecessary plurals
        return "1 byte"

    suffixes_table = [('bytes',0),('KB',0),('MB',1),('GB',2),('TB',2), ('PB',2)]

    num = float(size_bytes)
    for suffix, precision in suffixes_table:
        if num < 1024.0:
            break
        num /= 1024.0

    if precision == 0:
        formatted_size = "%d" % num
    else:
        formatted_size = str(round(num, ndigits=precision))

    return "%s %s" % (formatted_size, suffix)
markltbaker
  • 471
  • 5
  • 6
4

Drawing from all the previous answers, here is my take on it. It's an object which will store the file size in bytes as an integer. But when you try to print the object, you automatically get a human readable version.

class Filesize(object):
    """
    Container for a size in bytes with a human readable representation
    Use it like this::

        >>> size = Filesize(123123123)
        >>> print size
        '117.4 MB'
    """

    chunk = 1024
    units = ['bytes', 'KB', 'MB', 'GB', 'TB', 'PB']
    precisions = [0, 0, 1, 2, 2, 2]

    def __init__(self, size):
        self.size = size

    def __int__(self):
        return self.size

    def __str__(self):
        if self.size == 0: return '0 bytes'
        from math import log
        unit = self.units[min(int(log(self.size, self.chunk)), len(self.units) - 1)]
        return self.format(unit)

    def format(self, unit):
        if unit not in self.units: raise Exception("Not a valid file size unit: %s" % unit)
        if self.size == 1 and unit == 'bytes': return '1 byte'
        exponent = self.units.index(unit)
        quotient = float(self.size) / self.chunk**exponent
        precision = self.precisions[exponent]
        format_string = '{:.%sf} {}' % (precision)
        return format_string.format(quotient, unit)
xApple
  • 6,150
  • 9
  • 48
  • 49
4

Modern Django have self template tag filesizeformat:

Formats the value like a human-readable file size (i.e. '13 KB', '4.1 MB', '102 bytes', etc.).

For example:

{{ value|filesizeformat }}

If value is 123456789, the output would be 117.7 MB.

More info: https://docs.djangoproject.com/en/1.10/ref/templates/builtins/#filesizeformat

METAJIJI
  • 361
  • 2
  • 11
2

I like the fixed precision of senderle's decimal version, so here's a sort of hybrid of that with joctee's answer above (did you know you could take logs with non-integer bases?):

from math import log
def human_readable_bytes(x):
    # hybrid of https://stackoverflow.com/a/10171475/2595465
    #      with https://stackoverflow.com/a/5414105/2595465
    if x == 0: return '0'
    magnitude = int(log(abs(x),10.24))
    if magnitude > 16:
        format_str = '%iP'
        denominator_mag = 15
    else:
        float_fmt = '%2.1f' if magnitude % 3 == 1 else '%1.2f'
        illion = (magnitude + 1) // 3
        format_str = float_fmt + ['', 'K', 'M', 'G', 'T', 'P'][illion]
    return (format_str % (x * 1.0 / (1024 ** illion))).lstrip('0')
Community
  • 1
  • 1
HST
  • 67
  • 5
2

To get the file size in a human readable form, I created this function:

import os

def get_size(path):
    size = os.path.getsize(path)
    if size < 1024:
        return f"{size} bytes"
    elif size < pow(1024,2):
        return f"{round(size/1024, 2)} KB"
    elif size < pow(1024,3):
        return f"{round(size/(pow(1024,2)), 2)} MB"
    elif size < pow(1024,4):
        return f"{round(size/(pow(1024,3)), 2)} GB"
>>> get_size("a.txt")
1.4KB
jak bin
  • 380
  • 4
  • 8
2

Here is an oneliner lambda without any imports to convert to human readable filesize. Pass the value in bytes.

to_human = lambda v : str(v >> ((max(v.bit_length()-1, 0)//10)*10)) +["", "K", "M", "G", "T", "P", "E"][max(v.bit_length()-1, 0)//10]
>>> to_human(1024)
'1K'
>>> to_human(1024*1024*3)
'3M'
lxkarthi
  • 336
  • 4
  • 14
1

How about a simple 2 liner:

def humanizeFileSize(filesize):
    p = int(math.floor(math.log(filesize, 2)/10))
    return "%.3f%s" % (filesize/math.pow(1024,p), ['B','KiB','MiB','GiB','TiB','PiB','EiB','ZiB','YiB'][p])

Here is how it works under the hood:

  1. Calculates log2(filesize)
  2. Divides it by 10 to get the closest unit. (eg if size is 5000 bytes, the closest unit is Kb, so the answer should be X KiB)
  3. Returns file_size/value_of_closest_unit along with unit.

It however doesn't work if filesize is 0 or negative (because log is undefined for 0 and -ve numbers). You can add extra checks for them:

def humanizeFileSize(filesize):
    filesize = abs(filesize)
    if (filesize==0):
        return "0 Bytes"
    p = int(math.floor(math.log(filesize, 2)/10))
    return "%0.2f %s" % (filesize/math.pow(1024,p), ['Bytes','KiB','MiB','GiB','TiB','PiB','EiB','ZiB','YiB'][p])

Examples:

>>> humanizeFileSize(538244835492574234)
'478.06 PiB'
>>> humanizeFileSize(-924372537)
'881.55 MiB'
>>> humanizeFileSize(0)
'0 Bytes'

NOTE - There is a difference between Kb and KiB. KB means 1000 bytes, whereas KiB means 1024 bytes. KB,MB,GB are all multiples of 1000, whereas KiB, MiB, GiB etc are all multiples of 1024. More about it here

Community
  • 1
  • 1
jerrymouse
  • 16,964
  • 16
  • 76
  • 97
1

What you're about to find below is by no means the most performant or shortest solution among the ones already posted. Instead, it focuses on one particular issue that many of the other answers miss.

Namely the case when input like 999_995 is given:

Python 3.6.1 ...
...
>>> value = 999_995
>>> base = 1000
>>> math.log(value, base)
1.999999276174054

which, being truncated to the nearest integer and applied back to the input gives

>>> order = int(math.log(value, base))
>>> value/base**order
999.995

This seems to be exactly what we'd expect until we're required to control output precision. And this is when things start to get a bit difficult.

With the precision set to 2 digits we get:

>>> round(value/base**order, 2)
1000 # K

instead of 1M.

How can we counter that?

Of course, we can check for it explicitly:

if round(value/base**order, 2) == base:
    order += 1

But can we do better? Can we get to know which way the order should be cut before we do the final step?

It turns out we can.

Assuming 0.5 decimal rounding rule, the above if condition translates into:

enter image description here

resulting in

def abbreviate(value, base=1000, precision=2, suffixes=None):
    if suffixes is None:
        suffixes = ['', 'K', 'M', 'B', 'T']

    if value == 0:
        return f'{0}{suffixes[0]}'

    order_max = len(suffixes) - 1
    order = log(abs(value), base)
    order_corr = order - int(order) >= log(base - 0.5/10**precision, base)
    order = min(int(order) + order_corr, order_max)

    factored = round(value/base**order, precision)

    return f'{factored:,g}{suffixes[order]}'

giving

>>> abbreviate(999_994)
'999.99K'
>>> abbreviate(999_995)
'1M'
>>> abbreviate(999_995, precision=3)
'999.995K'
>>> abbreviate(2042, base=1024)
'1.99K'
>>> abbreviate(2043, base=1024)
'2K'
ayorgo
  • 2,803
  • 2
  • 25
  • 35
  • That was a very good read, and it was fun to see your mathematical algorithm. Unfortunately it's slow, as you pointed out. I have previously solved this issue in a high-performance way in the following post: https://stackoverflow.com/a/63839503/8874388 – Mitch McMabers Sep 11 '20 at 01:48
0
def human_readable_data_quantity(quantity, multiple=1024):
    if quantity == 0:
        quantity = +0
    SUFFIXES = ["B"] + [i + {1000: "B", 1024: "iB"}[multiple] for i in "KMGTPEZY"]
    for suffix in SUFFIXES:
        if quantity < multiple or suffix == SUFFIXES[-1]:
            if suffix == SUFFIXES[0]:
                return "%d%s" % (quantity, suffix)
            else:
                return "%.1f%s" % (quantity, suffix)
        else:
            quantity /= multiple
Matt Joiner
  • 112,946
  • 110
  • 377
  • 526
0

This feature if available in Boltons which is a very handy library to have for most projects.

>>> bytes2human(128991)
'126K'
>>> bytes2human(100001221)
'95M'
>>> bytes2human(0, 2)
'0.00B'
cmcginty
  • 113,384
  • 42
  • 163
  • 163
0

Here's something I wrote for a different question...

Much like xApple's answer, this object will always print in a human-readable format. The difference is that it's also a proper int, so you can do math with it! It passes the format specifier straight through to the number format and tacks on the suffix, so it's pretty much guaranteed that the requested length will be exceeded by two or three characters. I've never had a use for this code, so I haven't bothered to fix it!


class ByteSize(int):

    _KB = 1024
    _suffixes = 'B', 'KB', 'MB', 'GB', 'PB'

    def __new__(cls, *args, **kwargs):
        return super().__new__(cls, *args, **kwargs)

    def __init__(self, *args, **kwargs):
        self.bytes = self.B = int(self)
        self.kilobytes = self.KB = self / self._KB**1
        self.megabytes = self.MB = self / self._KB**2
        self.gigabytes = self.GB = self / self._KB**3
        self.petabytes = self.PB = self / self._KB**4
        *suffixes, last = self._suffixes
        suffix = next((
            suffix
            for suffix in suffixes
            if 1 < getattr(self, suffix) < self._KB
        ), last)
        self.readable = suffix, getattr(self, suffix)

        super().__init__()

    def __str__(self):
        return self.__format__('.2f')

    def __repr__(self):
        return '{}({})'.format(self.__class__.__name__, super().__repr__())

    def __format__(self, format_spec):
        suffix, val = self.readable
        return '{val:{fmt}} {suf}'.format(val=val, fmt=format_spec, suf=suffix)

    def __sub__(self, other):
        return self.__class__(super().__sub__(other))

    def __add__(self, other):
        return self.__class__(super().__add__(other))
    
    def __mul__(self, other):
        return self.__class__(super().__mul__(other))

    def __rsub__(self, other):
        return self.__class__(super().__sub__(other))

    def __radd__(self, other):
        return self.__class__(super().__add__(other))
    
    def __rmul__(self, other):
        return self.__class__(super().__rmul__(other))   

Usage:

>>> size = 6239397620
>>> print(size)
5.81 GB
>>> size.GB
5.810891855508089
>>> size.gigabytes
5.810891855508089
>>> size.PB
0.005674699077644618
>>> size.MB
5950.353260040283
>>> size
ByteSize(6239397620)
Terry Davis
  • 511
  • 5
  • 8
0

In case someone is wondering, to convert @Sridhar Ratnakumar's answer back to bytes you could do the following:

import math

def format_back_to_bytes(value):
    for power, unit in enumerate(["", "Ki", "Mi", "Gi", "Ti", "Pi", "Ei", "Zi"]):
        if value[-3:-1] == unit:
            return round(float(value[:-3])*math.pow(2, 10*power))

Usage:

>>> format_back_to_bytes('212.4GiB')
228062763418
Vinícius Queiroz
  • 849
  • 1
  • 7
  • 19
-1

Referencing Sridhar Ratnakumar's answer, updated to:

def formatSize(sizeInBytes, decimalNum=1, isUnitWithI=False, sizeUnitSeperator=""):
  """format size to human readable string"""
  # https://en.wikipedia.org/wiki/Binary_prefix#Specific_units_of_IEC_60027-2_A.2_and_ISO.2FIEC_80000
  # K=kilo, M=mega, G=giga, T=tera, P=peta, E=exa, Z=zetta, Y=yotta
  sizeUnitList = ['','K','M','G','T','P','E','Z']
  largestUnit = 'Y'

  if isUnitWithI:
    sizeUnitListWithI = []
    for curIdx, eachUnit in enumerate(sizeUnitList):
      unitWithI = eachUnit
      if curIdx >= 1:
        unitWithI += 'i'
      sizeUnitListWithI.append(unitWithI)

    # sizeUnitListWithI = ['','Ki','Mi','Gi','Ti','Pi','Ei','Zi']
    sizeUnitList = sizeUnitListWithI

    largestUnit += 'i'

  suffix = "B"
  decimalFormat = "." + str(decimalNum) + "f" # ".1f"
  finalFormat = "%" + decimalFormat + sizeUnitSeperator + "%s%s" # "%.1f%s%s"
  sizeNum = sizeInBytes
  for sizeUnit in sizeUnitList:
      if abs(sizeNum) < 1024.0:
        return finalFormat % (sizeNum, sizeUnit, suffix)
      sizeNum /= 1024.0
  return finalFormat % (sizeNum, largestUnit, suffix)

and example output is:

def testKb():
  kbSize = 3746
  kbStr = formatSize(kbSize)
  print("%s -> %s" % (kbSize, kbStr))

def testI():
  iSize = 87533
  iStr = formatSize(iSize, isUnitWithI=True)
  print("%s -> %s" % (iSize, iStr))

def testSeparator():
  seperatorSize = 98654
  seperatorStr = formatSize(seperatorSize, sizeUnitSeperator=" ")
  print("%s -> %s" % (seperatorSize, seperatorStr))

def testBytes():
  bytesSize = 352
  bytesStr = formatSize(bytesSize)
  print("%s -> %s" % (bytesSize, bytesStr))

def testMb():
  mbSize = 76383285
  mbStr = formatSize(mbSize, decimalNum=2)
  print("%s -> %s" % (mbSize, mbStr))

def testTb():
  tbSize = 763832854988542
  tbStr = formatSize(tbSize, decimalNum=2)
  print("%s -> %s" % (tbSize, tbStr))

def testPb():
  pbSize = 763832854988542665
  pbStr = formatSize(pbSize, decimalNum=4)
  print("%s -> %s" % (pbSize, pbStr))


def demoFormatSize():
  testKb()
  testI()
  testSeparator()
  testBytes()
  testMb()
  testTb()
  testPb()

  # 3746 -> 3.7KB
  # 87533 -> 85.5KiB
  # 98654 -> 96.3 KB
  # 352 -> 352.0B
  # 76383285 -> 72.84MB
  # 763832854988542 -> 694.70TB
  # 763832854988542665 -> 678.4199PB
TylerH
  • 20,799
  • 66
  • 75
  • 101
crifan
  • 12,947
  • 1
  • 71
  • 56
-1

Here is an option using while:

def number_format(n):
   n2, n3 = n, 0
   while n2 >= 1e3:
      n2 /= 1e3
      n3 += 1
   return '%.3f' % n2 + ('', ' k', ' M', ' G')[n3]

s = number_format(9012345678)
print(s == '9.012 G')

https://docs.python.org/reference/compound_stmts.html#while

Zombo
  • 1
  • 62
  • 391
  • 407
-2

This solution might also appeal to you, depending on how your mind works:

from pathlib import Path    

def get_size(path = Path('.')):
    """ Gets file size, or total directory size """
    if path.is_file():
        size = path.stat().st_size
    elif path.is_dir():
        size = sum(file.stat().st_size for file in path.glob('*.*'))
    return size

def format_size(path, unit="MB"):
    """ Converts integers to common size units used in computing """
    bit_shift = {"B": 0,
            "kb": 7,
            "KB": 10,
            "mb": 17,
            "MB": 20,
            "gb": 27,
            "GB": 30,
            "TB": 40,}
    return "{:,.0f}".format(get_size(path) / float(1 << bit_shift[unit])) + " " + unit

# Tests and test results
>>> get_size("d:\\media\\bags of fun.avi")
'38 MB'
>>> get_size("d:\\media\\bags of fun.avi","KB")
'38,763 KB'
>>> get_size("d:\\media\\bags of fun.avi","kb")
'310,104 kb'
Peter F
  • 793
  • 8
  • 10