How does the assignment of variable works in function calls in R language?

Question

I am trying to exercise a simulation of Sierpinski triangle in R with affine transformation and Iterated Function System (IFS). And hopefully, I can further exercise how the simulation of Barnsley's fern can also be done. For those who know Chinese, this video is my starting point of this exercise.

Here is a short introduction of the simulation process:

Create an equilateral triangle, name the vertices A, B, C
Create a random initial point lying inside the triangle ABC
Sample A, B, C with equal chances
If the outcome is A, then move the initial point to the midpoint of A and itself
Repeat step 3, and move the last point to the midpoint of the outcome point and itself. By doing this repeatedly, we should see the path of the points looks like a Sierpinski triangle.

I wonder how the assignment of variable works inside a self-defined function. I would like to create an object (a matrix or a dataframe) to store the path of simulated points and keep updating the object to keep track of how the points move.

the following is my current codes:

# create the triangle
triangle <- matrix(c(A = c(-1,0), 
                     B = c(1, 0), 
                     C = c(0, sqrt(3))),
                   byrow = TRUE, nrow = 3, ncol = 2)
colnames(triangle) <- c("X", "Y") # axis name
rownames(triangle) <- c("A", "B", "C")

# sample an initial point inside the triangle ABC
sampleInit <- function(){
  X <- runif(1, min = -1, max = 1)
  Y <- runif(1, min = 0, max = sqrt(3))
  if( (Y >= 0) && (Y <= (sqrt(3)*X + sqrt(3))) && (Y <= -sqrt(3)*X+sqrt(3)) ){
    return(cbind(X, Y))
  } else {
    sampleInit()
  }
}

### graph: plot the triangle and the initial point together
graphics.off()  
plot(triangle, xlim = c(-1, 1), ylim = c(0, sqrt(3)))
par(new = TRUE)
plot(sampleInit(), xlim = c(-1, 1), ylim = c(0, sqrt(3)), col = "red")

### a three-sided dice: determine the direction to move along
diceRoll <- function(){
  return(sample(c("A", "B", "C"), size = 1, prob = c(1/3, 1/3, 1/3)))
}

## path
stepTrace <- as.data.frame(sampleInit())
move <- function(diceOutCome, stepTrace){
  lastStep <- tail(stepTrace, 1)
  if(diceOutCome == "A"){
    X <- (-1 + lastStep[,1])/2
    Y <- (0 + lastStep[,2])/2
  } else if(diceOutCome == "B"){
    X <- (1 + lastStep[,1])/2
    Y <- (0 + lastStep[,2])/2
  } else if(diceOutCome == "C"){
    X <- (0 + lastStep[,1])/2
    Y <- (sqrt(3) + lastStep[,2])/2
  }
  lastStep <- cbind(X, Y)
  stepTrace <- rbind(stepTrace, lastStep)
}

move(diceRoll(), stepTrace)
View(stepTrace)

Sorry for the long story and not jumping to the key question directly. My question is that stepTrace (the object I would like to store the path) didn't get updated as I execute the last two lines.

What I imagined was the assignment process in move() updates the dataframe stepTrace, however it turns out it doesn't. I check my code in the debugger, and found out that stepTrace did get updated inside the function call, but it didn't pass the new assigned value outside the function call. That's why I would like to ask how does the assignment process works in R. What is the difference between the this kind of process and other general purpose languages such as Java? (What I imagined to do this exercise in Java would not encounter this kind of assignment issue. Correct me if I am wrong since I am still new to Java)

Similar problems bother me when I tried to assign variables inside a loop. I know there is a base function assign that helps to resolve is issue, but I just don't know what is the mechanism behind it.

I tried to google my question, but I am not sure which keyword I should use, and I didn't find direct answers to my question. Any comment, keyword or external resource to the documentation is appreciated!

Have `move` return `stepTrace` and then do `stepTrace <- move(diceRoll(), stepTrace)`. You need to understand that everything that has been passed to or is assigned within a function is a local variable and doesn't exist outside the function environment. — Roland, Mar 16 '22 at 06:00
You could try ‘<<-‘. It would push the assignment up to the calling environment. Generally consider poor practice but might be ok here. — IRTFM, Mar 16 '22 at 07:29
@IRTFM: not the calling environment, an ancestor environment (or global environment if the variable doesn't exist in the ancestors). Those are the same for `move`, but they don't have to be. Read the fine Introduction to R manual, section 10.7 "Scope". — user2554330, Mar 16 '22 at 09:50

score 1 · Answer 1 · answered Mar 16 '22 at 06:12

In short, your move function does what you want, but it is not advisable to write it like that. In its current form, stepTrace is updated in the function's local environment, but not in the global environment, where your stepTrace lives. They are not the same stepTrace. To fix it, you can run stepTrace <- move(diceRoll(), stepTrace), but beware of the second circle. For a cleaner approach, remove the last stepTrace assignment from move.

From ?return: If the end of a function is reached without calling return, the value of the last evaluated expression is returned.

Consider the following examples:

x <- 5
a <- b <- c <- d <- 1

f1 <- function(x) x + 1
f2 <- function(x) return(x + 1)
f3 <- function(x) x <- x + 1 
f4 <- function(x) x <<- x + 1 

f1(1)
f2(1)
f3(1) # your problem
f4(1) # x gets replaced with x in f4, 2 in global environment.

a <- b <- c <- d <- 1

a <- f1(1)
b <- f2(1)
c <- f3(1)
d <- f4(1)

f3 and f4 are generally considered bad practice because of side effects, i.e. they (can) modify a non-local variable, f2 might trigger a discussion. For f3, see the result of

c(f3(1))
#> [1] 2

Given our experiment of calling f3(1) by itself, we'd expect a vector of length 0 (?). Consider removing any assignment as the last operation within your functions, and avoid naming your function arguments the same as the objects you intend to change.

score 0 · Answer 2 · answered Mar 16 '22 at 12:44

@DonaldSeinen explained how to fix your code in his answer. I'll try to point you to documentation for more details.

First, you don't need to go to external documentation. An Introduction to R and The R Language Definition manuals are included in R distributions. The Introduction describes what's going on in lots of detail in section 10.7, "Scope". There's a different description in the Language Definition in section 3.5, "Scope of Variables".

Some people find the language in those manuals to be too technical. An easier to read external reference that gets it right is Wickham's Advanced R, readable online at https://adv-r.hadley.nz/. Scoping is discussed in chapters 6 and 7, especially sections 6.4 and 7.2.

How does the assignment of variable works in function calls in R language?

2 Answers2