7

I have several variables in my dataset that need to be recoded in exactly the same way, and several other variables that need to be recoded in a different way. I tried writing a function to help me with this, but I'm having trouble.

library(dplyr)
recode_liberalSupport = function(arg1){
  arg1 = recode(arg1, "1=-1;2=1;else=NA")
  return(arg1)
}

liberals = c(df$var1, df$var4, df$var8)
for(i in unique(liberals)){
  paste(df$liberals[i] <- sapply(liberals, FUN = recode_liberalSupport))
}

R studio works on this for about 5 minutes then gives me this error message:

Error in `$<-.data.frame`(`*tmp*`, liberals, value = c(NA_real_, NA_real_,  : 
  replacement has 9 rows, data has 64600
In addition: Warning messages:
1: Unknown or uninitialised column: 'liberals'. 
2: In df$liberals[i] <- sapply(liberals, FUN = recode_liberalSupport) :
  number of items to replace is not a multiple of replacement length

Any help would be really appreciated! Thank you

Bertrand
  • 73
  • 1
  • 1
  • 3
  • You probably want to use `mutate_at` instead of `apply`, here. I think your syntax for `recode` is also not correct. Providing sample data is the best way to get working answers – Calum You Feb 07 '18 at 21:51
  • One issue is the your for loop. `unique(liberals)` is going to have fewer values than `liberals` – Punintended Feb 07 '18 at 21:51
  • Does this make sense `paste(df$liberals[i] <- sapply(liberals, FUN = recode_liberalSupport))`? (The issue is with `paste`.) – Rui Barradas Feb 07 '18 at 21:57

2 Answers2

19

This is neater I think with dplyr. Using recode correctly is a good idea. mutate_all() can be used to operate on the whole dataframe, mutate_at() on just selected variables. There are lots of ways to specify variables in dplyr.

mydata <- data.frame(arg1=c(1,2,4,5),arg2=c(1,1,2,0),arg3=c(1,1,1,1))

mydata

  arg1 arg2 arg3
1    1    1    1
2    2    1    1
3    4    2    1
4    5    0    1

mydata <- mydata %>% 
     mutate_at(c("arg1","arg2"), funs(recode(., `1`=-1, `2`=1, .default = NaN)))

mydata

  arg1 arg2 arg3
1   -1   -1    1
2    1   -1    1
3  NaN    1    1
4  NaN  NaN    1

I use NaN instead of NA as it is numeric is be simpler to manage within a column of other numbers.

Stephen Henderson
  • 6,340
  • 3
  • 27
  • 33
  • I see how this works, but how can I put the recoded variables in `mydata` back into my original dataframe? – Bertrand Feb 07 '18 at 22:20
  • @Steven Ok but doesn't this essentially recode all the variables in my data the same way? What if I want to recode some variables like `1=-1, 2=1, .default = NaN` and some like `1=1, 2=-1, .default = NaN`? Then put them all back into the same dataframe – Bertrand Feb 07 '18 at 22:38
  • Use `mutate_at(var1, var3, ...etc)` – Stephen Henderson Feb 07 '18 at 23:23
  • Does `.default` mean "every other value not specified"? – coip Aug 29 '18 at 20:37
0

As always there are many ways of doing this. I don't know dplyr well enough to use that function, but this seems to be what you are looking for.

mydata <- data.frame(arg1=c(1,2,4,5),arg2=c(1,1,2,0))
mydata
  arg1 arg2
1    1    1
2    2    1
3    4    2
4    5    0

Function to recode using a nested ifelse()

recode_liberalSupport <- function(var = "arg1", data=mydata) {
+   recoded <- ifelse(mydata[[var]] == 1, -1,
+                           ifelse(mydata[[var]] == 2, 1, NA))
+   return(recoded)
+ }

Call the function

recode_liberalSupport(var = "arg1")
[1] -1  1 NA NA

Replace the variable arg1 with recoded values.

mydata$arg1 <- recode_liberalSupport(var = "arg1") 
mydata
  arg1 arg2
1   -1    1
2    1    1
3   NA    2
4   NA    0
akaDrHouse
  • 2,190
  • 2
  • 20
  • 29