0

I want to print a file size in C++. My input is in bytes and I want to print it in KiB if it gets over 1024, in MiB if it gets over 1024*1024, etc. Alternatively it should print in KB for 1000 and above, and so on.

It should also have a fractional part so that I can distinguish between 1.5 GiB and 1.2 GiB.

What I know is that I can use the logarithm to compute which unit to choose. So if log_1024(x) >= 1 then it should be in KiB. This way I could avoid an unnecessary loop.

Also, I have a function for printing the fractions already:

std::string stringifyFraction(unsigned numerator,
                              unsigned denominator,
                              unsigned precision);
Jan Schultke
  • 17,446
  • 6
  • 47
  • 96

1 Answers1

3

The logarithm base 1000 or 1024 can indeed be used to determine the right unit. We actually just need the integral part of the logarithm, so the part in front of the decimal point. On modern hardware, the integer logarithm can be computed in O(1), so this will be slightly faster than using a for loop to get to the right unit. Here you can find out how to efficiently compute the integer logarithm of a number.

If the integral part is 0, we print in B, for 1 in KiB, etc. We can create a lookup table where the key is our logarithm:

constexpr const char FILE_SIZE_UNITS[8][3]{
    "B", "KB", "MB", "GB", "TB", "PB", "EB", "ZB"
};

Note that the table uses 3 as an inner size because all strings are null-terminated. You might also be wondering why the lookup table doesn't contain KiB units. This is because the i in the middle is constant an doesn't need to be part of the table. Also, there are two different unit systems for file sizes, one which is base 1000 and one which is base 1024. See Files size units: “KiB” vs “KB” vs “kB”. We can easily support both in one function.

We can then implement our stringifyFileSize method as follows:

// use SFINAE to only allow base 1000 or 1024
template <size_t BASE = 1024,
    std::enable_if_t<BASE == 1000 || BASE == 1024, int> = 0>
std::string stringifyFileSize(uint64_t size, unsigned precision = 0) noexcept
{
    constexpr const char FILE_SIZE_UNITS[8][3]{
        "B", "KB", "MB", "GB", "TB", "PB", "EB", "ZB"
    };

    // The linked post about computing the integer logarithm
    // explains how to compute this.
    // This is equivalent to making a table: {1, 1000, 1000 * 1000, ...}
    // or {1, 1024, 1024 * 1024, ...}
    constexpr auto powers = makePowerTable<Uint, BASE>();

    unsigned unit = logFloor<BASE>(size);

    // Your numerator is size, your denominator is 1000^unit or 1024^unit.
    std::string result = stringifyFraction(size, powers[unit], precision);
    result.reserve(result.size() + 5);

    // Optional: Space separating number from unit. (usually looks better)
    result.push_back(' ');
    char first = FILE_SIZE_UNITS[unit][0];
    // Optional: Use lower case (kB, mB, etc.) for decimal units
    if constexpr (BASE == 1000) {
        first += 'a' - 'A';
    }
    result.push_back(first);

    // Don't insert anything more in case of single bytes.
    if (unit != 0) {
        if constexpr (BASE == 1024) {
            result.push_back('i');
        }
        result.push_back(FILE_SIZE_UNITS[unit][1]);
    }

    return result;
}
Jan Schultke
  • 17,446
  • 6
  • 47
  • 96
  • `KB` (which is the same as `KiB`) is wrong since the other prefixes are for when working with powers of 10. It should say `kB` in `FILE_SIZE_UNITS`. I think hardcoding two separate lookup tables would improve things. One for powers of 2 and one for powers of 10. – Ted Lyngmo Aug 20 '20 at 20:12
  • @TedLyngmo `KB` is only falsely assumed to be the same as `KiB`. And [Google would disagree with you](https://www.google.co.in/search?newwindow=1&q=1000+KiB+to+KB&cad=h). Anyhow, this is somewhat opinionated, but I would agree that using the lower case would decrease ambiguity. – Jan Schultke Aug 20 '20 at 20:20
  • `K` means 1024 of something and has done so since computer scientists realized that they needed a way to disambiguate what they used and the normal kilo (k). If google says otherwise, google is wrong. "_KB is only falsely assumed to be the same as KiB_" is wrong too. [Binary Prefix table](https://en.wikipedia.org/wiki/Binary_prefix) - The only ambiguity is for `M` and `G` where the JEDEC names for the power of 2 units are colliding with the SI units for powers of 10. – Ted Lyngmo Aug 20 '20 at 20:28