19

I find hurry.filesize very useful but it doesn't give output in decimal?

For example:

print size(4026, system=alternative) gives 3 KB.

But later when I add all the values I don't get the exact sum. For example if the output of hurry.filesize is in 4 variable and each value is 3. If I add them all, I get output as 15.

I am looking for alternative of hurry.filesize to get output in decimals too.

pynovice
  • 7,424
  • 25
  • 69
  • 109

4 Answers4

62

This isn't really hard to implement yourself:

suffixes = ['B', 'KB', 'MB', 'GB', 'TB', 'PB']
def humansize(nbytes):
    i = 0
    while nbytes >= 1024 and i < len(suffixes)-1:
        nbytes /= 1024.
        i += 1
    f = ('%.2f' % nbytes).rstrip('0').rstrip('.')
    return '%s %s' % (f, suffixes[i])

Examples:

>>> humansize(131)
'131 B'
>>> humansize(1049)
'1.02 KB'
>>> humansize(58812)
'57.43 KB'
>>> humansize(68819826)
'65.63 MB'
>>> humansize(39756861649)
'37.03 GB'
>>> humansize(18754875155724)
'17.06 TB'
HelloGoodbye
  • 3,624
  • 8
  • 42
  • 57
nneonneo
  • 171,345
  • 36
  • 312
  • 383
  • That's it but the problem is: If output is 4.12 which gives 4.12 that's fine but if the output is 4 it gives 4.00. I only want 4 is there's no decimal. – pynovice Feb 21 '13 at 08:32
  • Added `round` so that you get only two decimal places. – nneonneo Feb 21 '13 at 08:45
  • 6
    Being a bit pedantic, but you might want to use KiB, MiB etc. … – filmor Feb 21 '13 at 08:47
  • 2
    I happen to grossly dislike those suffixes. But, as the code is quite clear, it should not be hard to change that if you really want ;) – nneonneo Feb 21 '13 at 08:48
  • Bleh. Should be fixed now. – nneonneo Feb 21 '13 at 09:05
  • There are some issues with this solution: `humansize(1000) -> 1B`, `humansize(512000) -> 5 KB` because you're deleting all the trailing zeros and also I'm not sure how the OP wants to treat the `humansize(1001)`, because this solution transforms it to `1001 B` while I would expect `0.98 KB` – Lipis Feb 21 '13 at 09:58
  • 4
    ...[formatting is hard](http://stackoverflow.com/questions/14997799/most-pythonic-way-to-print-at-most-some-number-of-decimal-places). – nneonneo Feb 21 '13 at 10:10
9

Disclaimer: I wrote the package I'm about to describe

The module bitmath supports the functionality you've described. It also addresses the comment made by @filmore, that semantically we should be using NIST unit prefixes (not SI), that is to say, MiB instead of MB. rounding is now supported as well.

You originally asked about:

print size(4026, system=alternative)

in bitmath the default prefix-unit system is NIST (1024 based), so, assuming you were referring to 4026 bytes, the equivalent solution in bitmath would look like any of the following:

In [1]: import bitmath

In [2]: print bitmath.Byte(bytes=4026).best_prefix()
3.931640625KiB

In [3]: human_prefix = bitmath.Byte(bytes=4026).best_prefix()

In [4]: print human_prefix.format("{value:.2f} {unit}")
3.93 KiB

I currently have an open task to allow the user to select a preferred prefix-unit system when using the best_prefix method.

Update: 2014-07-16 The latest package has been uploaded to PyPi, and it includes several new features (full feature list is on the GitHub page)

Tim Bielawa
  • 6,935
  • 2
  • 17
  • 11
  • 3
    I understand that some developers argue that this is a "too small a task to require a library" but i disagree because that causes developers to keep reinventing the wheel. Anyway, this library was just what i was looking for! Thanks Tim! – A. K. Tolentino Mar 09 '16 at 07:51
  • This is not reinventing the wheel, just learning it :) Packages for such things are nonsense. A snippet is well enough. Packages that contain nothing but a method are polluting repositories and dependencies. This is the kind of package that too many people will use out of laziness, and one day the author decides to change something... – Romain Vincent Aug 30 '18 at 18:36
  • 1
    @RomainVincent having the same code copy and pasted over possible thousands of projects is what I'd call polluting. Especially if one later happens to discover to find a corner case leading to a bug, getting every instance of that code fixed is a nightmare. For example the code in the [accepted answer by nneonneo](https://stackoverflow.com/a/14996816/3423324) here was improved 4 years after initial release. In this case that's not a big change, but it illustrates the point. Not everyone who copied that in the past might have noticed that a change was even made. – luckydonald Oct 10 '19 at 03:17
  • Well, yes. But only if you don't understand what you are doing in the first place. Which really is why there could be a bug on something as simple and straight forward as this. It's a math formula, its not communication over a complex protocol, or dealing with the code of a ... library. What you call possibility of free bug fix, I call danger of someone introducing unexpected changes in your own code. – Romain Vincent Oct 11 '19 at 20:13
5

This is not necessary faster than the @nneonneo solution, it's just a bit cooler, if I can say that :)

import math

suffixes = ['B', 'KB', 'MB', 'GB', 'TB', 'PB']

def human_size(nbytes):
  human = nbytes
  rank = 0
  if nbytes != 0:
    rank = int((math.log10(nbytes)) / 3)
    rank = min(rank, len(suffixes) - 1)
    human = nbytes / (1024.0 ** rank)
  f = ('%.2f' % human).rstrip('0').rstrip('.')
  return '%s %s' % (f, suffixes[rank])

This works based on the fact that the integer part of a logarithm with base 10 of any number is one less than the actual number of digits. The rest is pretty much straight forward.

Community
  • 1
  • 1
Lipis
  • 21,388
  • 20
  • 94
  • 121
  • 1
    You take rank as the base-1000 log instead of the base-1024 log. Why not use `math.log(nbytes, 1024)` instead? It is more obvious what that does. – nneonneo Jun 08 '15 at 16:13
  • 1
    math.log10 call fails with "ValueError: math domain error" if nbytes == 0. You need an extra check: `if nbytes == 0: return "0 B"` – foolo Nov 20 '16 at 13:12
1

I used to reinvent the wheel every time I wrote a little script or ipynb or whatever. It got trite, so I wrote the datasize python module. I'm posting this here because I just updated it, and wow have the Python versions moved up!

It is a DataSize class, which subclasses int, so arithmetic just works, however it returns int from arithmetic because I use it with Pandas and some numpy, and I didn't want to slow things down when there is python<-->C++ translation for matrix math libraries.

You can construct a DataSize object using a string with either SI or NIST suffixes in either bits or bytes, and even wierd word lengths if you need to work with data for embedded tech that uses those. The DataSize object has an intuitive format() code syntax for human-readable representation. Internally the value is just an integer count of 8-bit bytes.

eg.

>>> from datasize import DataSize
>>> 'My new {:GB} SSD really only stores {:.2GiB} of data.'.format(DataSize('750GB'),DataSize(DataSize('750GB') * 0.8))
'My new 750GB SSD really only stores 558.79GiB of data.'
Jeremy
  • 221
  • 1
  • 4