5

I want to do what seems a straightforward application of mapply in a data table. I want to multiply a series of data table columns by the value in another column. Here's my function. y is the single column to multiply the values in the other columns by. xIn is a column name to do this operation over.

f.xRatio <- function(xIn, y) {return(y * (xIn + 1)/(xIn - 1))}

I have a data table with a column called GDPratio and some columns with names like x.food1, x.food2, etc. I put these column names into a variable called x with

x <- paste0("x.", foodNames)

I create another variable with the names of the new columns created with the function

xRatio <- paste0("xRatio.", foodNames)

Here are two versions of my attempt at using mapply to create the xRatio columns from the function.

dt[, (xRatio) := mapply(FUN = f.xRatio, xIn = .SD, y = GDPRatio), .SDcols = (x)]

dt[, (xRatio) := mapply(FUN = f.xRatio, xIn = .(x), y = GDPRatio)]

Neither works. I think the first is close. I'm hoping someone can point out the flaw(s) in my logic without me creating a reproducible example.

JerryN
  • 2,356
  • 1
  • 15
  • 49

2 Answers2

5

If we are using Map/mapply, make sure to enclose the single column 'GDPRatio' in a list to take that as a single unit recycled over the list of columns in .SD.

dt[, (xRatio) := Map(f.xRatio, .SD, list(GDPRatio)), .SDcols = x]

Otherwise, the unit will be a single element in a vector and it gets recycled with the corresponding columns of .SD and resulting in length issues as stated in the OP's code

dt[, (xRatio) := Map(f.xRatio, .SD, GDPRatio), .SDcols = x]

Warning messages: 1: In mapply(FUN = f, ..., SIMPLIFY = FALSE) :
longer argument not a multiple of length of shorter 2: In [.data.table(dt, , :=((xRatio), Map(f.xRatio, .SD, GDPRatio)), : Supplied 2 columns to be assigned a list (length 5) of values (3 unused)

data

foodNames <- c("food1", "food2")
x <- paste0("x.", foodNames)
xRatio <- paste0("xRatio.", foodNames)

set.seed(24)
dt <- data.table(x.food1 = 2:6, x.food2 = 6:10, val = rnorm(5), 
                GDPRatio = c(0.5, 0.2, 0.3, 0.4, 0.1))
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Consider no apply loop and run vectorized arithmetic across subset of columns:

dt[, xRatio] <- dt$GDPRatio * (dt[, foodNames, with=FALSE]  + 1) / 
                              (dt[, foodNames, with=FALSE]  - 1)

This will be equivalent to @Frank's suggestion and @akrun's answer using below random data:

foodNames <- c("apple", "banana", "orange")

set.seed(4252018)  # SEEDED FOR REPRODUCIBILITY

dt <- data.table(
  apple = abs(rnorm(50)) * 100,
  banana = abs(rnorm(50)) * 100,
  orange = abs(rnorm(50)) * 100,
  GDPRatio = abs(rnorm(50))
)

f.xRatio <- function(xIn, y) {return(y * (xIn + 1)/(xIn - 1))}
xRatio <- paste0("xRatio.", foodNames)

# @Parfait's NO LOOP FUNCTION
dt[, xRatio] <- dt$GDPRatio * (dt[, foodNames, with=FALSE]  + 1) / 
                              (dt[, foodNames, with=FALSE]  - 1)

# @Frank's COMMENT
frank_dt <- dt[, (xRatio) := lapply(.SD, f.xRatio, y = GDPRatio), .SDcols = xRatio]

all.equal(dt, frank_dt)
# [1] TRUE
identical(dt, newdt)
# [1] TRUE

# @akrun'S ANSWER
akrun_dt <- dt[, (xRatio) := Map(f.xRatio, .SD, list(GDPRatio)), .SDcols = xRatio]

all.equal(dt, akrun_dt)
# [1] TRUE
identical(dt, akrun_dt)
# [1] TRUE
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • These are all great answers! The pros of Frank' and Akrun's answers are that they illustrate different ways to use the Map and apply approaches. They also don't create new copies of the dt. Parfait's approach is a bit clearer to read because it combines the formula with the actual calculation. Are there any cons to be considered? – JerryN Apr 25 '18 at 17:29
  • Consider reading this excellent post on apply loops and vectorization: https://stackoverflow.com/q/28983292/1422451. If you have the opportunity to vectorize your code as your situation does, many will agree to go that approach. – Parfait Apr 25 '18 at 17:35