1

I am trying to display a tibble with formatted numbers in order to ease the reading of the table by using a usual format style for that data type.

Optimally I am searching for something in the line of the scales package for ggplot2 such that the following would be possible:

t <- tibble(
    surface = c(98000, 178000000000, 254000000), 
    price = c(517244, 939484, 1340612), 
    rate = c(0.12, 0.07, 0.045)
)
print(t,
    label = c(
        surface = label_number_si(),
        price = label_dollar(),
        rate = label_percent()
    )
)
# A tibble: 3 x 3
    surface   price    rate
     <dbl>    <dbl>    <dbl>
1      98k $  517 244  12.0% 
2     178B $  939 484   7.0% 
3     254M $1 340 612   4.5%

currently when printing a tibble I receive the following output, which is pretty hard to read, especially for the price column:

print(t)
# A tibble: 3 x 3
       surface   price  rate
         <dbl>   <dbl> <dbl>
1        98000  517244 0.12 
2 178000000000  939484 0.07 
3    254000000 1340612 0.045

all similar questions found such as here or there seem to revolve around the scientific notation using the options(scipen = xxx) which doesn't really allow to define the output as desired.

I also tried to look for other packages, such as units but these also don't provide specific number formatting, only attachment of a unit to the column type.

Ben
  • 13
  • 4

3 Answers3

2

You can use scales::dollar() to format the price, sprintf() for the rate, and a helper function to format surface (I borrowed the one from here).

library(dplyr)

t <- tibble(
  surface = c(98000, 178000000000, 254000000), 
  price = c(517244, 939484, 1340612), 
  rate = c(0.12, 0.07, 0.045)
)

si_number = function(x, digits) {

  compress = function(x, n) {
    signif(x * 10^(-n), digits)
  }

  case_when(
    x >= 1e9   ~ paste0(compress(x, 9), "B"),
    x >= 1e6   ~ paste0(compress(x, 6), "M"),
    x >= 1000  ~ paste0(compress(x, 3), "k"),
    x >= 1     ~ as.character(compress(x, 0))
  )
}

t2 <- t %>%
  mutate(
    surface = si_number(surface, 3),
    price   = scales::dollar(price),
    rate    = sprintf("%.1f%%", rate * 100)
  )

t2
#> # A tibble: 3 x 3
#>   surface price      rate 
#>   <chr>   <chr>      <chr>
#> 1 98k     $517,244   12.0%
#> 2 178B    $939,484   7.0% 
#> 3 254M    $1,340,612 4.5%

Created on 2020-02-24 by the reprex package (v0.3.0)

Fleur De Lys
  • 480
  • 2
  • 9
2

The easiest way to change the format of a printed tibble is to create a function that prints a mutated version of the tibble.

You can use a little non-standard evaluation to pass any functions you like to apply to each column. This is very close to what you wanted I think:

library(tidyverse)
library(scales)

format_tibble <- function(tbl, ...)
{
  functions <- rlang::dots_list(...)
  if(length(functions) > 0)
  {
    if(length(tbl) < length(functions)) functions <- functions[seq_along(tbl)]
    columns <- names(functions)
    for(i in seq_along(columns))
    {
      fun <- functions[[i]]
      col <- as.name(columns[i])
      tbl <- mutate(tbl, !!quo_name(col) := fun(!!enquo(col)))
    }
  }
  print(tbl)
}

So now, taking your tibble:

t <- tibble( surface = c(98000, 178000000000, 254000000), 
             price   = c(517244, 939484, 1340612), 
             rate    = c(0.12, 0.07, 0.045))

We only need to do this:

t %>%
format_tibble(surface = label_number_si(),
              price   = label_dollar(),
              rate    = label_percent())
#> # A tibble: 3 x 3
#>   surface price      rate 
#>   <chr>   <chr>      <chr>
#> 1 98K     $517,244   12.0%
#> 2 178B    $939,484   7.0% 
#> 3 254M    $1,340,612 4.5%

Created on 2020-02-25 by the reprex package (v0.3.0)

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • so far I like your solution the most, its already really helpful. My only concern is that using your approach, all tibble look like holding only strings () now. Also using more complicated tibbles such as `library(units, tidyverse, scales)`, `install_symbolic_unit("USD")` `t <- tibble( surface = set_units(c(98000, 178000000000, 254000000), yard^2), price = set_units(c(517244, 939484, 1340612), USD), rate = set_units(c(0.12, 0.07, 0.045), percent) )` would not work... – Ben Feb 25 '20 at 17:23
  • @Ben that's only because you can't use those specific functions with the specific types that you have made in the tibble. You can define whatever functions you like to display those particular types in whichever way you choose. – Allan Cameron Feb 25 '20 at 18:01
1

Workaround by massaging the data as character vectors:

library(tibble)

options(scipen = 12)

t <- tibble(
  surface = c(98000, 178000000000, 254000000), 
  price = c(517244, 939484, 1340612), 
  rate = c(0.12, 0.07, 0.045)
)
# temp vars
t$KMB <- ifelse(t$surface >= 10^3 & t$surface < 10^6, "K",
  ifelse(t$surface >= 10^6 & t$surface < 10^9, "M", "B"))
t$surface_char <- gsub("0", "", as.character(t$surface))

# paste elements together
t$surface <- paste0(t$surface_char, t$KMB)        
t$price <- paste0("$ ", t$price)
t$rate <- paste0(as.character(format(t$rate *100, nsmall = 1)), "%")

# remove temp vars
t$KMB <- NULL
t$surface_char <- NULL

print(t)
xilliam
  • 2,074
  • 2
  • 15
  • 27