0

I am trying to write a function that will take a data.frame, a list (or a character vector) of variable names of the data.frame and create some new variables with names derived from the corresponding variable names in the list and values from the variables named in the list.

For example, if data.frame d has variable x, y, z, w, the list of names is c('x', 'z') the output maybe vectors with names x.cat, z.cat and values based on values of d$x and d$z.

I can do this with a loop

df <- data.frame(x = c(1 : 10), y = c(11 : 20), z = c(21 : 30), w = c(41: 50))  

vnames <- c("x", "w")

loopfunc <- function(dat, vlst){
  s <- paste(vlst, "cat", sep = ".")
  for (i in 1:length(vlst)){
  dat[s[i]] <- NA
  dat[s[i]][dat[vlst[i]] %% 4 == 0 ] <- 0
  dat[s[i]][dat[vlst[i]] %% 4 == 1 | dat[vlst[i]] %%4 == 3] <- 1
  dat[s[i]][dat[vlst[i]] %% 4 == 2 ] <- 2
 }
  dat[s]
}
dout <- loopfunc(df, vnames)

This would output a 10x2 data.frame with columns x.cat and w.cat, the values of these are 0, 1, or 2 depending on the remainder of the corresponding values of df$x and df$w mod 4.

I would like to find a way to something like this without loop, maybe using the apply functions?

Here is a failed attempt

noloopfunc <- function(dat, l){
  assign(l[2], NA)
  assign(l[2][d[l[1]] %% 4 == 0], 0)
  assign(l[2][d[l[1]] %% 4 == 2], 2)
  assign(l[2][(d[l[1]] %% 4 == 1) | (d[l[1]] %% 4 == 3)], 1)
  as.name(l[2])
}

newvnames <- sapply(vnames, function(x){paste(x, "cat", sep = ".")})
vpairs <- mapply(c, vnames, newvnames, SIMPLIFY = F)

lapply(vpairs, noloopfunc, d <- df)

Here the formal argument l is supposed to represent vpairs[[1]] or vpairs[[2]], both string vectors of length 2.

I found several threads on Stackoverflow on converting strings to variable names but I couldn't find anything where it is used in this way where the variables have to be referred to subsequently and assigned values in a non interactive way.

Thanks for any help.

bee
  • 3
  • 1
  • 2
  • You can do this with `assign`, but why would you ever want to? This isn't an end in itself, and there's almost certainly a better way to accomplish whatever you're really trying to do. Just put them in a `list` or even in an `environment`. – Gregor Thomas May 28 '15 at 23:46
  • Well I actually can't do it with assign here. I think there is a better way too or I wouldn't be asking here, would I? Please show me if you know a better way. I already know it has something to do with list or even environment too. This kind of general off hand remarks are not very helpful. – bee May 29 '15 at 00:10
  • `list2env(df[,c('x','z')])` and `attach`? I'm not a fan (and agree with @Gregor), but it's an option. – r2evans May 29 '15 at 00:12
  • And @bee, very often questions are asked poorly, so these kind of suggestions lead new programmers towards a different line of thinking about the problem. Please differentiate between "well-intentioned comment that I cannot use (for some reason)" and off-hand, which his comment was not. (I'm not saying you're a new programmer ... just speaking generically about comments like that.) – r2evans May 29 '15 at 00:14
  • Somewhat related: [30516325](http://stackoverflow.com/questions/30516325/converting-a-list-of-data-frames-into-individual-data-frames-in-r/) – r2evans May 29 '15 at 00:18
  • The loop works. But I am thinking of a scenario where there are many columns to process, e,g varnames <- colnames(df)[500 : 1000] – bee May 29 '15 at 00:19
  • 1
    I think the better way is to go straight to your goal, whatever you want to **do** with these variables. Share with us what your next step(s) are and maybe we can help you get there without this step. If you start this way I think your next question will be about trouble getting `eval(parse(...))` to work for you, which is hard to write, harder to debug, and hardest to maintain. There is almost certainly a better way but you need to tell us where you're going. – Gregor Thomas May 29 '15 at 00:30
  • @Gregor My goal is to recode a bunch of continuous variables into categorical ones in such a way that the names of the new variables are derived from the continuous variables in a systematic way. It is easy to do this by hand if there are not too many variables but I want a function which will do this automatically with any data.frame and any list of variable names in it without hard coding. I can add an option to specify the criterion of recoding each variable if the criteria are different. But I leave it out to avoid cluttering up the example. I think 6pool's answer may do it. – bee May 29 '15 at 18:01
  • Why do you want the variables in the global environment rather than as new columns in your existing data or in a new data frame? – Gregor Thomas May 29 '15 at 18:20
  • I never said I want them in the globalenv. I tried to show one possible way I thought might do what I wanted (but didn't work) I want to achieve what the loop does but without loop. In the process of trying out other ways I thought having strings to be treated as variable names might be the issue. Related threads I found here don't help because they all assume interactive use cases. – bee May 29 '15 at 19:19

1 Answers1

0

You can replace your loop with an apply variant

dout <- as.data.frame(sapply(vnames, function(x) {
    out <- rep(NA, nrow(df))
    out[df[,x] %% 4 == 0] <- 0
    out[df[,x] %% 4 == 1 | df[,x] %% 4 == 3] <- 1
    out[df[,x] %% 4 == 2] <- 2
    out
}))
names(dout) <- paste(vnames, "cat", sep=".")
Rorschach
  • 31,301
  • 5
  • 78
  • 129
  • Hi, thanks a ton! This is very simple and elegant and no esoteric assign and eval statements.Just what I want. – bee May 29 '15 at 19:08
  • I didn't see this before because somehow I thought 'out' in the function needed to have name depending on x, this example clears up a lot of my confusions. Thanks again. – bee May 29 '15 at 19:26
  • @bee glad it helped. I think some of the initial confusion came from trying to create names inside of the function. – Rorschach May 29 '15 at 19:30