Is it inefficient to write R code that indirectly references variables to bypass hard coding?

Question

Let's say I have the data frame below:

x <- data.frame(id= c("a", "b", "c", "d", "e")
                , term= c(179, 192, 189, 182, 179)
                , f17= c(1, 2, 3, 4, 5)
                , s18= c(6, 7, 8, 9, 10)
                , f18 = c(11, 12, 13, 14, 15)
                , s19 = c(16, 17, 18, 19, 20))

In this data frame, I want to create a variable that records the value for each id from the appropriate column for the given term (f17 corresponds to term 179, s18 corresponds to term 182, f18 corresponds to term 189, f19 corresponds to term 192).

Obviously, this can be done easily as a series of ifelse statements, but every few months I get new terms of data and I don't want to have to manually recode this each time I get more data. Also, I find this kind of coding with a lot of nested ifelse statements to be incredibly difficult to read.

I'm relatively new to R but am a very experienced SAS and SAS macro programmer, so I know that in SAS what I want to do can be done very easily with a couple of arrays and a do loop in a data step, and that's what I'm essentially trying to recreate in R. What I ended up doing was as follows.

NOTE: I realize that what is below isn't the same as a series of nested ifelse statements and is instead a sequence of ifelse statements in order that are over-writing the same variable, but this does get me the solution I need, accounting for all cases in my data.

xTerms <- c(179, 182, 189, 192)
xVars <- c("f17", "s18", "f18", "s19")

x$startVal <- NA
for(i in 1:length(xTerms)){
  x$startVal <- ifelse(x$term == xTerms[i], x[[xVars[i]]], x$startVal)
}

I should add, this is the desired result:

> x
  id term f17 s18 f18 s19 startVal
1  a  179   1   6  11  16        1
2  b  192   2   7  12  17       17
3  c  189   3   8  13  18       13
4  d  182   4   9  14  19        9
5  e  179   5  10  15  20        5

The idea with the code above being that when I get new data, I simply need to update the definitions of xTerms and xVars. Or I could even have those created dynamically based on a unique list of the values of the term variable in x and the variables in x.

I'm curious to get feedback from more experienced R users if this is the best way to tackle these sorts of iterative problems in R? Are there resources you can share on how to do this sort of a thing that takes better advantage of R?

I think a better option would be row.column indexing `x[cbind(match(xTerms, x$term), match(xVars, names(x)))]` — akrun, Jun 07 '19 at 16:05

score 2 · Accepted Answer · answered Jun 07 '19 at 16:13

2

You can use match...

xTerms <- c(179, 182, 189, 192)
xVars <- c("f17", "s18", "f18", "s19")

x$startVal <- sapply(1:nrow(x), function(i) x[i, xVars[match(x$term[i], xTerms)]])

x
  id term f17 s18 f18 s19 startVal
1  a  179   1   6  11  16        1
2  b  192   2   7  12  17       17
3  c  189   3   8  13  18       13
4  d  182   4   9  14  19        9
5  e  179   5  10  15  20        5

answered Jun 07 '19 at 16:13

Andrew Gustar

17,295
1
22
32

Fascinating! Thank you so much! You have given me much to think about here with the way you're using match! I've run into issues with functions and the apply family before but, I will keep working on that! – beri Jun 07 '19 at 16:20

IceCreamToucan · Answer 2 · 2019-06-07T18:44:20.100

2

If you put xTerms and xVars in a lookup table lkp you can convert your data to long form with melt and join with lkp to get the start vals. Then you can join back to x to add it as a column.

library(data.table)
setDT(x)

lkp <- data.table(Terms = xTerms, Vars = xVars)

startvals <- melt(x, c('id', 'term'))[lkp, on = .(term == Terms, variable == Vars)]

x[startvals, on = .(id, term), startVal := value]


x  
#    id term f17 s18 f18 s19 startVal
# 1:  a  179   1   6  11  16        1
# 2:  b  192   2   7  12  17       17
# 3:  c  189   3   8  13  18       13
# 4:  d  182   4   9  14  19        9
# 5:  e  179   5  10  15  20        5

edited Jun 07 '19 at 18:44

answered Jun 07 '19 at 18:07

IceCreamToucan

28,083
2
22
38

Thank you! Something that I've been thinking about is if the structure I'm using for my df is actually a good one, or if I should actually have the data in long form, and your response has reminded me to think more on that question. Also, I was not familiar with the syntax you're using for joining data (I have been using the merge function), so I've learned another new thing! Thank you! – beri Jun 07 '19 at 19:04
yeah, that syntax is specific to the data.table package. There's some good explanation of how it corresponds to left/right/inner at this question https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right – IceCreamToucan Jun 07 '19 at 19:05

score 1 · Answer 3 · answered Jun 07 '19 at 16:17

1

An option would be to use row/column indexing

x$startVal <- x[3:6][cbind(seq_len(nrow(x)), 
             match(xVars[match(x$term, xTerms)], names(x)[3:6]))]
x
#  id term f17 s18 f18 s19 startVal
#1  a  179   1   6  11  16        1
#2  b  192   2   7  12  17       17
#3  c  189   3   8  13  18       13
#4  d  182   4   9  14  19        9
#5  e  179   5  10  15  20        5

answered Jun 07 '19 at 16:17

akrun

874,273
37
540
662

1

Thank you! I've been wary of using cbind because it basically smashes 2 columns together without actually doing a merge based on one or more id variables, which to me feels dangerous, but in this context, I can see how it's very powerful! I'll have to play with it more. – beri Jun 07 '19 at 16:23
@beri The result of `cbind` is a matrix and should be very fast compared to the `for` loop – akrun Jun 07 '19 at 16:24
1

ah that is very useful to know in terms of efficiency! Thank you! – beri Jun 07 '19 at 16:28

Is it inefficient to write R code that indirectly references variables to bypass hard coding?

3 Answers3

Linked