Is there a way to use operators that are elements in a column in a data frame in R

Question

The actual goal is much broader than this but right in the middle of it all I need to perform equations where the operator is one of the values in a data frame. The sample code replicates three columns in the formats they are in from the df being used. In this example df, I would want to perform the operations 20+5, 10-10 and 5*15.

# R code for sample df
a <- c(20,10,5)
b <- as.character(c("+","-","*"))
c <- c(5,10,15)
df <- data.frame(a,b,c)

tmfmnk · Accepted Answer · 2019-10-16T20:51:06.543

2

A considerably clear way using dplyr could be:

df %>%
 mutate(d = case_when(b == "+" ~ a + c,
                      b == "-" ~ a - c,
                      TRUE ~ a * c))

Here you essentially define the relations. As there is not that many operators, it is not that problematic.

Another way, already outlined by @Gregor involves eval(parse(...)):

df %>%
 rowwise() %>%
 mutate(d = paste(a, b, c),
        d = eval(parse(text = d)))

However, you should use it carefully. See What specifically are the dangers of eval(parse(…))?

edited Oct 16 '19 at 20:51

answered Oct 16 '19 at 20:45

tmfmnk

38,881
4
47
67

2

This will perform poorly (solely by the nature of `case_when`, your implementation is clear and idiomatic to it), you might consider `group_by(b)` and using a simpler `if` sequence for vectorized calculations. Perhaps slightly messy compared to this, but almost certainly faster (if data size is a factor). – r2evans Oct 16 '19 at 20:52
1

@r2evans thank you for noting it :) I was never really thinking about `case_when()` in terms of performance. – tmfmnk Oct 16 '19 at 20:58
I don't think it was suggested in the question that data size would be large, in which case the iterative nature of `case_when` should perform well within normal "perception tolerances". – r2evans Oct 16 '19 at 21:00
tmfmnk solution worked perfectly. Before reaching out I was butchering if statements. r2evans made me go back and look at that again and mutate(if_else...) worked very well also. I did not however use group_by, just straight mutate(if_else...) size in this case is not an issue. Could size or something else be a factor in wanting to incorporate group_by(b) with if sequence instead of mutate(if_else) or should I just be happy with the wonderful solutions and get on with my life and not overthink this? – Bruce Oct 16 '19 at 22:50
1

I personally think that it could affect performance only marginally. And even that just in the case of very large datasets (tens of millions rows) :) But if speed is a concern, then you can run a benchmark (see the library `microbenchmark`) and see how different possibilities behave. – tmfmnk Oct 17 '19 at 09:05

score 1 · Answer 2 · answered Oct 16 '19 at 20:45

1

sapply(with(df, paste(a, b, c)), function(x) eval(parse(text = x)))
 20 + 5 10 - 10  5 * 15 
     25       0      75

But beware! Things can get very messy when you go down this path. Fragile, and difficult to debug.

answered Oct 16 '19 at 20:45

Gregor Thomas

136,190
20
167
294

score 1 · Answer 3 · answered Oct 16 '19 at 20:45

If you are just using simple primitive binary operands, you can get the functions and apply them to the values. For example

with(df, mapply(function(op,x,y) {op(x, y)}, 
  mget(as.character(b), inherits=TRUE), a, c))

Here we use mget() to get the functions for each of the operators and then use mapply() to pass the other columns as parameters.

Is there a way to use operators that are elements in a column in a data frame in R

3 Answers3