0

I have a series of lines of code that replace the contents of an existing column based on the contents of another column (i.e. I am creating a categorical variable where the 'cut' function is not applicable). I am new to R and want to write a function that will perform this task on all data.frames without having to insert and customize 50 lines of code each time.

X is the data frame, Y is the categorical variable, and Z is the other (string) variable. This code works:

X$Y <- ""
X <- transform(X, Y=ifelse(Z=="Alameda",20,""))
... (many more lines)

For example I do:

d.f$loc <- ""
d.f <- transform(d.f, loc=ifelse(county=="Alameda",20,""))
# ... and so on

Now I want to do this for several dataframes and different columns instead of loc and county. However, neither of these functions produces the desired results:

ab<-function(Y,Z,env=X) {
env$Y<-transform(env,Y=ifelse(Z=="Alameda",20,""))
...
}

abc<-function(X,Y,Z) {
X<-transform(X,Y=ifelse(Z=="Alameda",20,""))
...
}

Both of these functions run without error but do not alter the data frame X in any way. Am I doing something wrong in calling the environment or using a function within another function? It seems like a simple question and I would not post if I had not already spent 5+ hours trying to learn this. Thanks in advance!

jogo
  • 12,469
  • 11
  • 37
  • 42
Eric
  • 1
  • 3
  • 1
    R uses "call by value" **for all objects**. Only the return value goes back to the calling enviroment. http://stackoverflow.com/questions/20986093/parameter-passing-mechanism-in-r/20986303#20986303 – jogo Mar 28 '17 at 13:44
  • Please give more information about your dataframe and how you want to call the function, i.e. edit your question: http://stackoverflow.com/posts/43071058/edit – jogo Mar 28 '17 at 17:32

1 Answers1

2

R uses "call by value" for all objects. Only the return value goes back to the calling enviroment. parameter passing mechanism in R
You can do 

ab <- function(X, Y, Z) { 
   X <- transform(X, Y=ifelse(Z=="Alameda",20,"")) 
   ... 
   return(X) 
}

If your dataframes are in a list L you can do lapply(L, ab) or eventually lapply(L, ab, Y=..., Z=...) As a result you will get a list of the modified dataframes. BTW: Have also a look at with() and within(), e.g. X$Y <- with(X, ifelse(Z=="Alameda",20,""))

implicit returning the value

There is no need for an explicit call of return(...) - you can do it implicit, i.e. using the issue that a function returns the value of its last calculated expression:

ab <- function(X, Y, Z) { 
   X <- transform(X, Y=ifelse(Z=="Alameda",20,"")) 
   ... 
   X ### <<<<< last expression
}

Here is example how you can do it for your situation:

ab <- function(X, Y, Z) { 
  X[, Y] <- ifelse(X[,Z]>12,20,99) 
  # ... 
  X ### <<<<< last expression
}
B <- BOD # BOD is one of the dataframes which come with R
ab(B, "loc", "demand")
Community
  • 1
  • 1
jogo
  • 12,469
  • 11
  • 37
  • 42
  • Unfortunately the highly voted answers in the linked question are really quite silly, and offer patently *bad advice*. Don’t use `return` in R unless you need to, it’s [cargo cult](https://en.wikipedia.org/wiki/Cargo_cult_programming). – Konrad Rudolph Mar 28 '17 at 14:24
  • @KonradRudolph ok, what do you recommend? I now edit the answer to delete the link to the question. – jogo Mar 28 '17 at 14:29
  • 1
    Your answer, as it is, is good. I’m just starting to get allergic to people who intentionally advocate ignoring the fact that R is a functional programming language, and who say we should treat it as an imperative language, warts and all. This is OK for people who don’t know better. But if people *advocate* this, they proliferate ignorance and bad programming practices. Your answer isn’t doing this! But the answers in the thread you linked to were. – Konrad Rudolph Mar 28 '17 at 14:31
  • Thanks a ton for your help! I tried the two functions above, and when I submit ab(d.f,loc,county) am getting the response: "Error in ifelse(z == "Alameda", 20, "") : object 'county' not found". The variable 'county' is a column in d.f. I tinkered around with different ways of referencing it, but have not found a solution! – Eric Mar 28 '17 at 15:47
  • @KonradRudolph I don't think there is much wrong with the use of a `return` statement, especially in some cases. For a discussion around this topic [see the question I asked a while ago](http://stackoverflow.com/questions/11738823/explicitly-calling-return-in-a-function-or-not). At heart R is a functional language, but you can program effectively in a imperative style in some cases. The dichotomy between either functional or imperative is not something I totally agree with. – Paul Hiemstra Mar 28 '17 at 17:46
  • What if I need to modify an argument and return another object? – papgeo Apr 30 '22 at 15:11
  • 1
    @papgeo You can return both (the modified argument and the other object in a list): `return(list(modarg=..., otherO=...))`. – jogo May 10 '22 at 11:52