A faster way to change multiple variables

Question

I am new to R and Stack, so please let me know what etiquette I may be unintentionally ignoring.

I have multiple variables I need to recode. They are consecutive. I have been using this, and experimenting with mutate (including 2:20 to grab those consecutive vars.) but cannot get it to work. amer is my df

amer$ir1 <- recode(amer$ir01, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")
amer$ir02 <- recode(amer$ir02, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")
amer$ir03 <- recode(amer$ir03, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")
amer$t01 <- recode(amer$t01, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")
amer$t02 <- recode(amer$t02, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")
amer$t03 <- recode(amer$t03, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")
amer$t04 <- recode(amer$t04, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")
amer$m01 <- recode(amer$m01, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")
amer$m02 <- recode(amer$m02, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")
amer$m03 <- recode(amer$m03, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")

From `ir01` to `m03` there are only 10 variables, you say `2:20` (and that's 1 vars). — Rui Barradas, May 12 '19 at 17:31
create a lookup table. then (1) join to it, loop or (2) reshape long, join, reshape back — MichaelChirico, May 12 '19 at 17:33
How is this a duplicate? The other question only asks for dplyr solutions @RuiBarradas — Hector Haffenden, May 12 '19 at 17:36
@HectorHaffenden You are right, I missed that point. The alleged dupe does answer the question, though. — Rui Barradas, May 12 '19 at 17:43
Yh, not sure if possible to edit that other question to accept non dplyr solutions, then I’ll post my answer there and can leave this is a dupe? @RuiBarradas (not sure on protocol here, what would you advise) — Hector Haffenden, May 12 '19 at 17:45
@HectorHaffenden No, the other question asks for `dplyr` solutions so you should post here. — Rui Barradas, May 12 '19 at 17:48
These are examples indicative of my df. I just want to know in general how to change a consecutive series of variables without changing the entire df (like mutate_all). The variables have a likert scale that need to be reversed. — Mark Carroll, May 12 '19 at 17:55

Hector Haffenden · Answer 1 · 2019-05-12T18:03:53.780

This should help,

amer <- data.frame(ir01 = 1:20, ir02 = 1:20, ir03 = 1:20)

library(memisc) # This is where recode is from
apply(amer, 2, function(x) recode(x, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA"))

From @Rui Barradas, to keep the dataframe class use the following when running the apply function,

amer[] <- apply(amer, 2, function(x) recode(x, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA"))

This is assuming your data looks something like,

> amer
   ir01 ir02 ir03 ...
1     1    1    1 ...
2     2    2    2 ...
3     3    3    3 ...
4     4    4    4 ...
5     5    5    5 ...
6     6    6    6 ...
7     7    7    7 ...
8     8    8    8 ...
9     9    9    9 ...
10   10   10   10 ...
11   11   11   11 ...
12   12   12   12 ...
13   13   13   13 ...
14   14   14   14 ...
15   15   15   15 ...
16   16   16   16 ...
17   17   17   17 ...
18   18   18   18 ...
19   19   19   19 ...
20   20   20   20 ...

This returns,

      ir01 ir02 ir03
 [1,]    4    4    4
 [2,]    3    3    3
 [3,]    2    2    2
 [4,]    1    1    1
 [5,]    5    5    5
 [6,]    6    6    6
 [7,]    7    7    7
 [8,]   NA   NA   NA
 [9,]   NA   NA   NA
[10,]   10   10   10
[11,]   11   11   11
[12,]   12   12   12
[13,]   13   13   13
[14,]   14   14   14
[15,]   15   15   15
[16,]   16   16   16
[17,]   17   17   17
[18,]   18   18   18
[19,]   19   19   19
[20,]   20   20   20

And to keep the dataframe class you would do `amer[] <- apply(etc)`. Upvote. — Rui Barradas, May 12 '19 at 17:50
Thanks for addressing this. I must not have provided my question clearly. Take a look at my comment above. — Mark Carroll, May 12 '19 at 17:56
Hi Mark, Could you maybe provide another example with the following features, a current data set and the result you want, kind of like what I have in the current answer? I don't quite understand which part of my answer does not help. Read this for more information [How to make a great r reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) — Hector Haffenden, May 12 '19 at 18:01

jay.sf · Answer 2 · 2019-05-12T19:01:22.363

You could define in a vector recode which variables are to change and lapply over it with an ifelse in which you do some arithmetic.

Supposed this data frame

head(df1)
#   ir01 ir02 dont.change.me
# 1    1    4              1
# 2    8    8              2
# 3    1    8              3
# 4    1    8              4
# 5    2    4              5
# 6    4    2              6

Define recode vector,

recode <- c("ir01", "ir02")

and lapply over defined columns within:

df1[recode] <- lapply(df1[recode], function(x) ifelse(x %in% 8:9, NA, abs(x - 5)))
head(df1)
#   ir01 ir02 dont.change.me
# 1    4    1              1
# 2   NA   NA              2
# 3    4   NA              3
# 4    4   NA              4
# 5    3    1              5
# 6    1    3              6

Looks reversed, only the ones who should!

Factors?

Sometimes those guys are factors,

df1$ir01 <- lapply(df1$ir01, as.factor)  # intentionally change `ir01` into factor
str(df1)
# 'data.frame': 20 obs. of  3 variables:
#  $ ir01          : Factor w/ 6 levels "1","2","3","4",..: 1 5 1 1 2 4 2 2 1 4 ...
#  $ ir02          : int  4 8 8 8 4 2 4 3 2 1 ...
#  $ dont.change.me: int  1 2 3 4 5 6 7 8 9 10 ...

and we could expand our function to do them:

df1[recode] <- lapply(df1[recode], 
                      function(x) {
                        if (is.factor(x))
                          x <- as.numeric(levels(x))[x]
                        ifelse(x %in% 8:9, NA, abs(x - 5))
                      })
head(df1)
#   ir01 ir02 dont.change.me
# 1    4    1              1
# 2   NA   NA              2
# 3    4   NA              3
# 4    4   NA              4
# 5    3    1              5
# 6    1    3              6

Data

df1 <- structure(list(ir01 = c(1L, 8L, 1L, 1L, 2L, 4L, 2L, 2L, 1L, 4L, 
                               1L, 8L, 9L, 4L, 2L, 2L, 3L, 1L, 1L, 3L), 
                      ir02 = c(4L, 8L, 8L, 8L, 4L, 2L, 4L, 3L, 2L, 1L, 
                               2L, 9L, 3L, 9L, 2L, 4L, 4L, 9L, 2L, 8L), 
                      dont.change.me = 1:20), class = "data.frame", 
                 row.names = c(NA, -20L))

Awesome! That is super helpful for the 8s and 9s. Any ideas on how to reverse the 1-4? — Mark Carroll, May 12 '19 at 18:38
@MarkCarroll They actually are, check `df1` before reversing, see update. — jay.sf, May 12 '19 at 18:43

Jason Johnson · Accepted Answer · 2019-05-12T19:28:26.973

You may also want to consider a data.table solution to this problem as well. It scales well for large data sets where you may have over 100,000+ rows. I use recode from the car package because it plays well with data.table. I get an error using memisc with the recode_key syntax below. Anyway here is what you can do to put it altogether:

library(data.table)
library(car)
amer <- data.table(ir01 = 1:20, ir02 = 1:20, ir03 = 1:20) #read data in as a data.table

recode_key<-c("1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA") #modify this to add other recodes
recode_cols<-c("ir01","ir02") #If you want to only make changes to specific columns list them here

amer[,eval(recode_cols):=lapply(.SD,function(x) recode(x,recode_key)),.SDcols=recode_cols] #This will change the columns in the data.table

Note that I used eval to make sure it did not create a new column named recode_cols! Then you use the special symbol .SD so the recode function iterates over the columns of the data.table. If you want to apply the recoding to all columns you could just leave the .SDcols argument blank and of course remove the eval(recode_cols): and just start with the lapply.

A final thing to note is that I did not need to assign the last line of code to a global variable. The reason data.table is fast is because it will automatically update the original data using pointers so no copy is necessary. However, be careful because if you run the last line of code twice you will get back what you started with other than the NAs. Let me know if that explanation makes sense.

Very helpful. Thank you! – Mark Carroll May 12 '19 at 21:31 — Mark Carroll, May 12 '19 at 21:31

A faster way to change multiple variables

3 Answers3