1

I am a Stata user trying to switch to R and having the usual beginner's struggle. I have been trying (and failing) to do a loop for a few days and I now surrender. What I want to do (in a loop):

  • start from a list of variable names

  • create a new variable

  • recode that new variable(s) based on the value of existing variables

  • possibly do so using the dplyr syntax, but this is not essential, only for consistency with the rest of my code.

Here is a stylised example of what I am trying to do. In my actual data, the x.x and x.y variables originate from the join function applied to 2 existing data frames.

N <- 1000
  df  <- data.frame(x1 = rnorm(N),
x2.x = rnorm(N)+2,x2.y = rnorm(N)-2,
x3.x = rnorm(N)+3,x3.y = rnorm(N)-3)

varlist <- c("x2","x3")
lapply(varlist, function(x) {
   df <- df %>% mutate(x = ifelse(x1 < 0, paste0(x,".y"),paste0(x,".x")) # generate varialble "x" values from existing x.x and x.y
  })

When I run the lapply part of the code I get the error message

Error: unexpected '}' in: " df <- df %>% mutate(x = ifelse(x1 < 0, paste0(x,".y"),paste0(x,".x")) # generate varialble "x" values from existing x.x and x.y }"

even though it should be expected... I am sure there a number of mistakes in my code, and that's partly because I am used to macros in Stata for which there is no direct equivalent in R. Anyway, if you can point me in the right direction it would be fantastic!

mariodrumblue
  • 171
  • 1
  • 2
  • 11
  • You are missing a closing bracket at the end of the `mutate( ... )` call. That is, you have 4x `(` and only 3x `)` – SymbolixAU Dec 25 '16 at 03:02
  • The error message is telling you the `}` is unexpected; it's not where it should be. – SymbolixAU Dec 25 '16 at 03:03
  • Are you after something like `df$x2 <- ifelse(df$x1 < 0, df$x2.y, df$x2.x)` ? – SymbolixAU Dec 25 '16 at 04:21
  • yes exactly. but using a loop and possibly using the dplyr syntax – mariodrumblue Dec 25 '16 at 05:24
  • Do you absolutely *have* to use a loop? when a simple `df[, varlist] <- c(ifelse(df$x1 < 0, df$x2.y, df$x2.x), ifelse(df$x1 < 0, df$x3.y, df$x3.x))` will do the job? – SymbolixAU Dec 25 '16 at 06:43
  • Hi SymbolixAU, this is just an example of what I am trying to do, but in reality I have more than just two variables and so I am trying to use a loop. I have read about the for function, and lapply, but I keep writing bad code so I'd like to learn the right syntax. – mariodrumblue Dec 25 '16 at 06:48

3 Answers3

4

The reason your code doesn't work is that your paste0(x, ".y") is literally pasting the x with .y. And that's it, you're not telling it to subset the data by that column.

What you actually should be doing is subsetting the data according to the column name that's generated by paste0(x, ".y"). So for example, to get the column of data x2.y you can go

df[, paste0(varlist[1], ".y")]
## and of course the same can be done for second item of varlist
# df[, paste0(varlist[2], ".y")]

Now we know how to subset columns by a variable name, and because you want to learn how to write it in a loop, we can replace the numbers in varlist[1] (and varlist[2]) with a 'looping' variable

Here are two ways to do it, one using a for loop, and the other using sapply

For loop

for(i in varlist){
  df[, i] <- ifelse(df[, "x1"] < 0, df[, paste0(i, ".y")], df[, paste0(i, ".x")])
}

head(df)
#            x1       x2.x       x2.y     x3.x       x3.y         x2        x3
# 1 -0.56047565  1.0042013 -2.5116037 2.849693 -2.8034502 -2.5116037 -2.803450
# 2 -0.23017749  0.9600450 -1.7630621 2.672243 -2.3498868 -1.7630621 -2.349887
# 3  1.55870831  1.9820198 -2.5415892 1.551835 -2.3289958  1.9820198  1.551835
# 4  0.07050839  1.8678249 -0.7807724 2.302715 -4.2841578  1.8678249  2.302715
# 5  0.12928774 -0.5493428 -1.8258641 5.598490 -5.0261096 -0.5493428  5.598490
# 6  1.71506499  3.0405735 -2.6152683 2.962585 -0.7946739  3.0405735  2.962585

sapply

You can also do this using an *apply, and in this instance I'm using sapply so that it 'simplifies' the result (whereas an lapply would return lists)

df[, varlist] <- sapply(varlist, function(x){
   ifelse(df[, "x1"] < 0, df[, paste0(x, ".y")], df[, paste0(x, ".x")])
})

head(df)
#            x1       x2.x       x2.y     x3.x       x3.y         x2        x3
# 1 -0.56047565  1.0042013 -2.5116037 2.849693 -2.8034502 -2.5116037 -2.803450
# 2 -0.23017749  0.9600450 -1.7630621 2.672243 -2.3498868 -1.7630621 -2.349887
# 3  1.55870831  1.9820198 -2.5415892 1.551835 -2.3289958  1.9820198  1.551835
# 4  0.07050839  1.8678249 -0.7807724 2.302715 -4.2841578  1.8678249  2.302715
# 5  0.12928774 -0.5493428 -1.8258641 5.598490 -5.0261096 -0.5493428  5.598490
# 6  1.71506499  3.0405735 -2.6152683 2.962585 -0.7946739  3.0405735  2.962585

Data

set.seed(123)   ## setting the seed as we're sampling
N <- 1000
df  <- data.frame(x1 = rnorm(N),
                  x2.x = rnorm(N)+2,x2.y = rnorm(N)-2,
                  x3.x = rnorm(N)+3,x3.y = rnorm(N)-3)
SymbolixAU
  • 25,502
  • 4
  • 67
  • 139
  • thanks! your code works beautifully. I know this is a bit demanding, but would you know how to do the same thing using dplyr syntax? – mariodrumblue Dec 25 '16 at 07:47
  • I tried the following code: set.seed(123) ## setting the seed as we're sampling N <- 1000 df <- data.frame(x1 = rnorm(N), x2.x = rnorm(N)+2,x2.y = rnorm(N)-2, x3.x = rnorm(N)+3,x3.y = rnorm(N)-3) df[, varlist] <- sapply(varlist, function(x) df %>% mutate_(x = ifelse("x1" < 0, paste0(x,".y"),paste0(x,".x")) )) head(df) but then I get something really strange. The newly created x2 variable is equal to x1 (instead of x2.x), while newly created x3 variable is equal to x2.x (instead of x3.x) – mariodrumblue Dec 25 '16 at 08:01
  • @mariodrumblue I've never liked using `dplyr` syntax in a loop, particularly when using dynamically created variable names. See [this answer](http://stackoverflow.com/q/26003574/5977215) for an example of why I avoid it. – SymbolixAU Dec 25 '16 at 08:32
0

try this brother

replace mutate by mutate_

https://cran.r-project.org/web/packages/dplyr/vignettes/nse.html

ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235
  • thanks! I tried and it almost gave me what I am after, except I get i) a new variable named x (while it should be x2 or x3 depending on the value of x in the loop) and ii) while I can see the printed output on the console, the df data frame is still unchanged. What am I missing? This the code: – mariodrumblue Dec 25 '16 at 06:38
  • lapply(varlist, function(x) df <- df %>% mutate_(x = ifelse("x1" < 0, paste0(x,".y"),paste0(x,".x")) )) head(df) – mariodrumblue Dec 25 '16 at 06:39
0

This worked for me:

lapply(varlist, function(x) 
  df <- df %>% mutate(x = ifelse(x1 < 0, paste0(x,".y"),paste0(x,".x")) # generate varialble "x" values from existing x.x and x.y
))

You do not need the braces to designate a loop using lapply. Read this for more info on lapply syntax.

Jeff
  • 209
  • 1
  • 10
  • thanks. I see the problem. However, I am still not getting what I want: – mariodrumblue Dec 25 '16 at 03:16
  • thanks. I see the problem. However, I am still not getting what I want: using your code I get i) a new variable named x (while it should be x2 or x3 depending on the value of x in the loop) and ii) the value given to x is x2.x or x2.y (while it should be the numeric value corresponding to those variables), and iii) while I can see the printed output on the console, the df data frame is still unchanged. Any hint? sorry for the many questions. Mario – mariodrumblue Dec 25 '16 at 03:22