1

I am trying to do the same thing as this question: Add max value to a new column in R, however, I want to pass in a variable instead of the column name directly so I don't hard code the columns name into the formula.

Sample code:

a <- c(1,1,2,2,3,3)  
b <- c(1,3,5,9,4,NA)
d <- data.table(a, b)
d

a b
1 1
1 3
2 5
2 9
3 4
3 NA

I can get this:

a b max_b
1 1 3
1 3 3
2 5 9
2 9 9
3 4 4
3 NA 4

By hard coding it: setDT(d)[, max_b:= max(b, na.rm = T), a] but I would like to do something like this instead:

cn <- "b"
setDT(d)[, paste0("max_", cn):= max(cn, na.rm = T), a]

However, this is not working because inside of max() it evaluates to max of the character instead of the column. And it evaluates to a column named max_b that contains the value b because max("b") = "b". I get why this is happening, I just do not know a workaround.

What is a solution to this?

Note: the above stack question I tagged was marked as a duplicate and closed, but I chose that question because I am using the accepted answer from it in my code. I also do not 100% agree that it is a duplicate question anyways.

Bear
  • 662
  • 1
  • 5
  • 20

2 Answers2

2

Try setDT(d)[, paste0("max_", cn) := eval(parse(text = max(eval(parse(text = cn))))), a]

# output
   a b max_b
1: 1 1     3
2: 1 3     3
3: 2 5     9
4: 2 9     9
5: 3 4     4

# example with missing values
a <- c(1,1,2,2,3,3)  
b <- c(1,3,5,9,4,NA)
d <- data.table(a, b)
cn <- "b"
setDT(d)[, paste0("max_", cn) := eval(parse(text = max(eval(parse(text = cn)),
                                                       na.rm = TRUE))), a]
#output
   a  b max_b
1: 1  1     3
2: 1  3     3
3: 2  5     9
4: 2  9     9
5: 3  4     4
6: 3 NA     4
nghauran
  • 6,648
  • 2
  • 20
  • 29
  • 1
    What if I want to slap `, na.rm = T` in there? – Bear Sep 11 '18 at 21:41
  • You just have to add it in the `max()` call – nghauran Sep 11 '18 at 21:47
  • 1
    I noticed if I add another field in with only an NA value and run the same code it inputs `-INF` in there instead of `NA` since I have `na.rm = T` is there a way to leave it `NA` if there is only `NA` values, or should I just change `-Inf` to `NA` after? – Bear Sep 11 '18 at 22:11
  • 1
    It would be difficult to play with `na.rm = T` and `na.rm = F` at the same time. I would go for your second option using something like `d[d == -Inf] <- NA` – nghauran Sep 11 '18 at 22:21
0

One option is to specify the variable in .SDcols and then apply the function on .SD (Subset of Data.table).

d[, paste0("max_", cn) := lapply(.SD, max, na.rm = TRUE), by = a, .SDcols = cn]
d
#   a  b max_b
#1: 1  1     3
#2: 1  3     3
#3: 2  5     9
#4: 2  9     9
#5: 3  4     4
#6: 3 NA     4

Another option is converting to symbol and then do the evaluation

d[, paste0("max_", cn) := max(eval(as.symbol(cn)), na.rm = TRUE), by = a]
akrun
  • 874,273
  • 37
  • 540
  • 662