I have a very large data frame (think 30-50 million records), hence I'm using data.table
for this problem. I am much more familiar with dplyr
than with data.table
.
Let's consider the following small example. Note that there are many more columns in my actual data set.
library(data.table)
library(magrittr)
library(stringi)
set.seed(42)
format_pct <- function(x){
paste0(formatC(x * 100, digits = 1, format = 'f'), "%")
}
df <- data.frame(x = c(1, NA, 2, 4, NA),
y = c(0, 1, NA, 2, 5),
huge_numsf = sample.int(500000:1000000, size = 5),
huge_numsg = sample.int(500000:1000000, size = 5),
percent_a = format_pct(runif(5)),
percent_b = format_pct(runif(5)))
> df
x y huge_numsf huge_numsg percent_a percent_b
1 1 0 457404 259548 45.8% 94.0%
2 NA 1 468537 368294 71.9% 97.8%
3 2 NA 143070 67334 93.5% 11.7%
4 4 2 415222 328495 25.5% 47.5%
5 NA 5 320871 352530 46.2% 56.0%
I would like to apply prettyNum()
to all columns except x
, y
, and columns with the string 'percent'
.
If this data frame weren't large, I would do
df[,colnames(df)[
!(colnames(df) %in% c("x", "y", colnames(df)[stri_detect_fixed(colnames(df), "percent")]))
]] <-
apply(X = df[,colnames(df)[
!(colnames(df) %in% c("x", "y", colnames(df)[stri_detect_fixed(colnames(df), "percent")]))
]],
MARGIN = 2,
FUN = prettyNum,
big.mark = ",")
> df
x y huge_numsf huge_numsg percent_a percent_b
1 1 0 457,404 259,548 45.8% 94.0%
2 NA 1 468,537 368,294 71.9% 97.8%
3 2 NA 143,070 67,334 93.5% 11.7%
4 4 2 415,222 328,495 25.5% 47.5%
5 NA 5 320,871 352,530 46.2% 56.0%
Assume now that we've made df
a data.table
; i.e.:
df <- data.frame(x = c(1, NA, 2, 4, NA),
y = c(0, 1, NA, 2, 5),
huge_numsf = sample.int(500000:1000000, size = 5),
huge_numsg = sample.int(500000:1000000, size = 5),
percent_a = format_pct(runif(5)),
percent_b = format_pct(runif(5))) %>%
data.table(.)
Is there a way to do the above using data.table
syntax?