Using a loop to define values of case_when in R

Question

I'm currently using case_when to define a new variable in my data as such:

data[,46] <- NA

data[,46] <- case_when(
   data[,35] ==  1 ~ data[,36],
   data[,35] ==  2 ~ data[,37],
   data[,35] ==  3 ~ data[,38],
   data[,35] ==  4 ~ data[,39],
   data[,35] ==  5 ~ data[,40],
   data[,35] ==  6 ~ data[,41],
   data[,35] ==  7 ~ data[,42],
   data[,35] ==  8 ~ data[,43],
   data[,35] ==  9 ~ data[,44],
   data[,35] ==  10 ~ data[,45]
)

I'm trying to write a loop to make this function more efficient, but am running into some trouble. Here is what I have attempted:

for (j in 1:10) {
data[,46] <- case_when(
   data[,35] ==  j ~ data[,35+j]
)
}

However, this is returning NAs for all of my values of data[,46]. Any thoughts on what might be going wrong? I would be happy to provide sample data if necessary, but I'm thinking this is more related to me making a simple programming mistake. Thanks in advance!

This seems like a better problem so solve by shaping your data with `tidyr` perhaps. It would be easier to help if you provided a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Show what your real goal is rather than just the code you tried to write to solve it. — MrFlick, Oct 08 '18 at 18:29

Rui Barradas · Answer 1 · 2018-10-09T09:44:49.747

3

All you have to do is to remember that R is vectorized.
You are comparing data[, 35] to the integers 1 to 10 and for each of these assign data[, 35 + <1 to 10>] back to data[, 35]. So all you have to do is

data[, 35] <- data[, 35 + data[, 35]]

If there are values in data[, 35] not in 1:10 then an ifelse will be more appropriate.

data[, 35] <- ifelse(data[, 35] %in% 1:10, data[, 35 + data[, 35]], data[, 35])

edited Oct 09 '18 at 09:44

answered Oct 08 '18 at 19:28

Rui Barradas

70,273
8
34
66

Not exactly. I'm checking to see whether data[,35] is equal to the values of 1-10 and depending on that, inputting data[,36] into data[,46] into the values where data[,35] == 1, data[,37] into data[,46] when data[,35]==2, etc. Doing `data[, 35] <- data[, 35 + data[, 35]]` gives me the following error: `Error in `[.data.frame`(data, , 35 + data[, 35]) : undefined columns selected` – Julian Oct 08 '18 at 22:29
@Zereg Then you must have values not in 1-10. See the edit. – Rui Barradas Oct 09 '18 at 09:42

score 1 · Accepted Answer · answered Oct 08 '18 at 18:59

1

You may need [j] as shown below to store its iteration in data[,46]

for (j in 1:10) {
data[,46][j]<- case_when(
   data[,35] ==  j ~ data[,35+j]
)}

answered Oct 08 '18 at 18:59

e.matt

836
1
5
12

Thank you! Your solution worked for me about an hour ago... but now I feel like I'm going crazy because it's not replicating. I'm getting this error now: `for (j in 2:10) { data[,46][j] <- case_when( data$since == 1 ~ lag(data[,31], 1), data$since == j ~ data[,36+j] ) }` (I know the code is a bit different, I kept the example in the original post simple to make the question as easy to answer as possible). Any thoughts as to what's going on? – Julian Oct 08 '18 at 22:32
1

It’s hard to understand without knowing your data fully. The lag function may be causing the result stored in data[,46] to be smaller than the dimensions of the data frame, ie you have 1 result short of the number of rows for your data frame.. – e.matt Oct 09 '18 at 04:47

Using a loop to define values of case_when in R

2 Answers2

Linked