37

I expect there is already an answer for this on stackoverflow, and I simply failed to find it.

Desired outcome: Quickly convert the file size element in a file.info() call from bytes to KB, MB, etc. I'm fine if the output is either i) a character string with the desired size type, e.g., "96 bytes" or ii) simply a numeric conversion, e.g., from 60963 bytes to 60.963 KB (per Google).

Repro steps:

  1. Create a folder to store the file:

    dir.create("census-app/data")
    
  2. Download the file (~60KB):

    download.file("http://shiny.rstudio.com/tutorial/lesson5/census-app/data/counties.rds",
    "census-app/data/counties.rds")
    
  3. Use file.info()$size to return the file size in bytes:

    file.info("census-app//data//counties.rds")$size
    [1] 60963
    

From there, I'm stuck. I realize I can do some complicated/manual parsing and calculation to make the conversion (see Converting kilobytes, megabytes etc. to bytes in R).

However, I'm hoping I can simply use a base function or something similar:

    format(file.info("census-app//data//counties.rds")$size, units = "KB")
    [1] "60963"
    # Attempt to return file size in KB simply returns the size in bytes
    # NOTE: format(x, units = "KB") works fine when I
    # pass it object.size() for an object loaded in R
zx8754
  • 52,746
  • 12
  • 114
  • 209
Daniel Fletcher
  • 1,165
  • 4
  • 13
  • 23
  • An apparently removed comment made a valid point I'd like to answer: Why not just use the _simple_ math of `x bytes / 1024` to return the value in KB? I agree this is a simple calculation and part of my goal is to avoid manual intervention a) in case I accidentally enter something like 1000, instead of 1024 and b) to forgo researching the correct conversion ratio. – Daniel Fletcher Apr 22 '15 at 04:24

2 Answers2

52

The object.size() function does this type of formatting for it's results, but its meant to tell you the size of the R object you pass to it. It is not set up to take an arbitrary by value.

However, we can "steal" some of it's formatting logic. You can call it with

utils:::format.object_size(60963, "auto")
# [1] "59.5 Kb"

In that way we can call the un-exported formatting function. You can bring up the additional formatting options on the ?format.object_size help page. Note that it uses the rule that 1 Kb = 1024 bytes (not 1000 as in your example).

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • 1
    Thank you, sir! When I pull up `?format.object_size` the help page points to `object.size {utils}`. Will you please explain how I know when to expand a function like `object.size()` to `some_function.object_size` or point me to an explanatory resource? I'm inferring this is a simple combination of the two functions, and I'm guessing `_` characters need to be changed to `.`. Correct? – Daniel Fletcher Apr 22 '15 at 04:15
  • 3
    This case was a bit unusual. I looked for a function that I thought might do formatting, found `object.size()`, then looked at the source (type `object.size` without the parenthesis). I saw that it returns on object of type "object_size". (But it's really not that common to use the function with periods replaced with underscores and it could be anything). Then I looked for methods for that class with `methods(class="object_size")` and found the formatting function. – MrFlick Apr 22 '15 at 04:21
  • 8
    The proper way to call `utils:::format.object_size()` is to call `format()` and make sure the object passed has the class attribute set. This can be done as `size <- structure(size, class="object_size")` and then `format(size, units="auto")`, or in one go as `format(structure(size, class="object_size"), units="auto")`. – HenrikB May 01 '15 at 14:11
  • 1
    By now also SI units are supported: `format(structure(2^32-1, class="object_size"), units="auto", standard="SI")` Thanks @HenrikB, see https://github.com/HenrikBengtsson/Wishlist-for-R/issues/6 – ismirsehregal Oct 22 '18 at 12:54
23

Use the humanReadable() function in the gdata package. It has options to report the size in base 1000 ('SI') or base 1024 ('IEC') units, and it is also vectorized so you can process an entire vector of sizes at the same time.

For example:

> humanReadable(c(60810, 124141, 124, 13412513), width=4)
[1] "60.8 kB" "124 kB"  "124 B"   "13.4 MB"
> humanReadable(c(60810, 124141, 124, 13412513), standard="IEC", width=4)
[1] "59.4 KiB" "121 KiB"  "124 B"    "12.8 MiB"

I'm currently working to prepare release 2.16.0 of gdata, which adds the ability to indicate which unit you would like to use for reporting the sizes, as well as "Unix"-style units.

> humanReadable(c(60810, 124141, 124, 13412513), standard="SI", units="kB")
[1] "   60.8 kB" "  124.1 kB" "    0.1 kB" "13412.5 kB"
> humanReadable(c(60810, 124141, 124, 13412513), standard="IEC", units="KiB")
[1] "   59.4 KiB" "  121.2 KiB" "    0.1 KiB" "13098.2 KiB"
humanReadable(c(60810, 124141, 124, 13412513), standard="Unix", units="K")
[1] "   59.4 K" "  121.2 K" "    0.1 K" "13098.2 K"

-Greg [maintainer of the gdata package]

Update

CRAN has accepted gdata version 2.16.1, which supports standard="Unix" and units= options, and it should be available on a CRAN mirrors shortly.

  • 2
    I second using `gdata::humanReadable()` for this, especially since R's own `format()` function for `object_size` objects uses *incorrect* notation, e.g. Kb (=Kbits) when it should use KB (or KiB), cf. https://stat.ethz.ch/pipermail/r-devel/2014-September/069755.html – HenrikB May 01 '15 at 14:14