How to recode multiple variables for a subset of a dataframe?

Question

I'm lost, so any directions would be helpful. Let's say I have a dataframe:

df <- data.frame(
  id = 1:12,
  v1 = rep(c(1:4), 3),
  v2 = rep(c(1:3), 4),
  v3 = rep(c(1:6), 2),
  v4 = rep(c(1:2), 6))

My goal would be to recode 2=4 and 4=2 for variables v3 and v4 but only for the first 4 cases (id < 5). I'm looking for a solution that works for up to twenty variables. I know how to do basic recoding but I don't see a simple way to implement the subset condition while manipulating multiple variables.

score 3 · Answer 1 · answered Dec 18 '19 at 14:18

3

You can try mutate_at with case_when in dplyr

library(dplyr)

df %>%
  mutate_at(vars(v3:v4), ~case_when(id < 5 & . == 4 ~ 2L, 
                                    id < 5 & . == 2 ~ 4L, 
                                    TRUE ~.))
#   id v1 v2 v3 v4
#1   1  1  1  1  1
#2   2  2  2  4  4
#3   3  3  3  3  1
#4   4  4  1  2  4
#5   5  1  2  5  1
#6   6  2  3  6  2
#7   7  3  1  1  1
#8   8  4  2  2  2
#9   9  1  3  3  1
#10 10  2  1  4  2
#11 11  3  2  5  1
#12 12  4  3  6  2

With mutate_at you can specify range of columns to apply the function.

answered Dec 18 '19 at 14:18

Ronak Shah

377,200
20
156
213

Ok, thank you! This looks like a nice solution. What I don't quite understand is the use of "~" and "." here, so I don't really see the role of "TRUE ~." at the end. The dot sort of works like a loop through the variables? – 2freet Dec 18 '19 at 14:34
1

@2freet `TRUE ~ .` at the end refers for all the cases where the column is not 4 or 2, so in that case it keeps the same value. `~` is a formula-style syntax, you can read more about it at `?case_when`. – Ronak Shah Dec 18 '19 at 14:45

score 3 · Accepted Answer · answered Dec 18 '19 at 14:29

3

Here is a base R solution,

df[1:5, c('v3', 'v4')] <- lapply(df[1:5, c('v3', 'v4')], function(i) 
                                       ifelse(i == 2, 4, ifelse(i == 4, 2, i)))

which gives,

   id v1 v2 v3 v4
1   1  1  1  1  1
2   2  2  2  4  4
3   3  3  3  3  1
4   4  4  1  2  4
5   5  1  2  5  1
6   6  2  3  6  2
7   7  3  1  1  1
8   8  4  2  2  2
9   9  1  3  3  1
10 10  2  1  4  2
11 11  3  2  5  1
12 12  4  3  6  2

answered Dec 18 '19 at 14:29

Sotos

51,121
6
32
66

Thanks as well. I think it's very usefull to know the base R version of a solution! – 2freet Dec 18 '19 at 14:35
or `df[1:5, c('v3', 'v4')][df[1:5, c('v3', 'v4')] == 2 | df[1:5, c('v3', 'v4')] == 4] <- 6-df[1:5, c('v3', 'v4')][df[1:5, c('v3', 'v4')] == 2 | df[1:5, c('v3', 'v4')] == 4]` (which can probably be shortened with indices...) – Cath Dec 18 '19 at 14:40
1

@Cath You should put that in a new answer! – Sotos Dec 18 '19 at 14:45

score 3 · Answer 3 · answered Dec 18 '19 at 14:57

Another, more direct, option is to get the indices of the numbers to replace, and to replace them by 6 minus the number (6-4=2, 6-2=4):

whToChange <- which(df[1:5, c("v3", "v4")] ==2 | df[1:5, c("v3", "v4")]==4, arr.ind=TRUE)

df[, c("v3", "v4")][whToChange] <- 6-df[, c("v3", "v4")][whToChange]

head(df, 5)
#  id v1 v2 v3 v4
#1  1  1  1  1  1
#2  2  2  2  4  4
#3  3  3  3  3  1
#4  4  4  1  2  4
#5  5  1  2  5  1

GKi · Answer 4 · 2019-12-18T15:02:11.287

You can use match and a lookup table - just in chase you have to recede more than two values.

rosetta <- matrix(c(2,4,4,2), 2)
df[1:4, c("v3", "v4")] <- lapply(df[1:4, c("v3", "v4")], function(x) {
  i <- match(x, rosetta[1,]); j <- !is.na(i); "[<-"(x, j, rosetta[2, i[j]])})
df
#   id v1 v2 v3 v4
#1   1  1  1  1  1
#2   2  2  2  4  4
#3   3  3  3  3  1
#4   4  4  1  2  4
#5   5  1  2  5  1
#6   6  2  3  6  2
#7   7  3  1  1  1
#8   8  4  2  2  2
#9   9  1  3  3  1
#10 10  2  1  4  2
#11 11  3  2  5  1
#12 12  4  3  6  2

Have also a look at R: How to recode multiple variables at once or Recoding multiple variables in R

How to recode multiple variables for a subset of a dataframe?

4 Answers4