1

Here is the source code for the "ave" function in R:

function (x, ..., FUN = mean) 
{
    if (missing(...)) 
        x[] <- FUN(x)
    else {
        g <- interaction(...)
        split(x, g) <- lapply(split(x, g), FUN)
    }
    x
}

I am having trouble understanding how the assignment, "split(x, g) <- lapply(split(x, g), FUN)" works. Consider the following example:

# Overview: function inputs and outputs
> x = 10*1:6
> g = c('a', 'b', 'a', 'b', 'a', 'b')
> ave(x, g)
[1] 30 40 30 40 30 40

# Individual components of "split" assignment
> split(x, g)
$a
[1] 10 30 50
$b
[1] 20 40 60
> lapply(split(x, g), mean)
$a
[1] 30
$b
[1] 40

# Examine "x" before and after assignment
> x
[1] 10 20 30 40 50 60
> split(x, g) <- lapply(split(x, g), mean)
> x
[1] 30 40 30 40 30 40

Questions:

• Why does the assignment, "split(x,g) <- lapply(split(x,g), mean)", directly modify x? Does "<-" always modify the first argument of a function, or is there some other rule for this?

• How does this assignment even work? Both the "split" and "lapply" statements have lost the original ordering of x. They are also length 2. How do you end up with a vector of length(x) that matches the original ordering of x?

adn bps
  • 599
  • 4
  • 16

1 Answers1

5

This is a tricky one. <- usually does not work in this way. What is actually happening is that you are not calling split(), you are calling a replacement function called split<-(). The documentation of split says

[...] The replacement forms replace values corresponding to such a division. unsplit reverses the effect of split.

See also this answer

Stefan F
  • 2,573
  • 1
  • 17
  • 19
  • Yes and interestingly also `vec[x] <- 123` is actually calling the function `\`[<-\`` like this : `vec <- \`[<-\`(vec,x,123)` – digEmAll Apr 01 '18 at 19:51