2

Consider the following:

y<-c("A","B","C")  
x<-z<-c(1,2,3)  
names(x)<-y
"names<-"(z,y)

If you run this code, you will discover that names(x)<-y is not identical to "names<-"(z,y). In particular, one sees that names(x)<-y actually changes the names of x whereas "names<-"(z,y) returns z with its names changed.

Why is this? I was under the impression that the difference between writing a function normally and writing it as an infix operator was only one of syntax, rather than something that actually changes the output. Where in the documentation is this difference discussed?

J. Mini
  • 1,868
  • 1
  • 9
  • 38
  • Me again, your answer is in this discussion: https://stackoverflow.com/questions/15759117/what-exactly-is-copy-on-modify-semantics-in-r-and-where-is-the-canonical-source – RaphaelS Dec 21 '20 at 15:23
  • @RaphaelS That's extremely abstract. – J. Mini Dec 21 '20 at 15:28
  • See: [3.4.4 Subset assignment](https://cran.r-project.org/doc/manuals/R-lang.html#Subset-assignment) – GKi Dec 21 '20 at 15:47
  • 3
    `names(x)<-y` is actually sugar for `x<-"names<-"(x,y)` and not just `"names<-"(x,y)`. – nicola Dec 21 '20 at 15:49
  • 3
    See http://adv-r.had.co.nz/Functions.html#special-calls. When you call `names(x) <- y`, the R parser notices that the lhs is not a normal object, and therefore does two things: (1) looks for `names<-`, and if it finds it, (2) does a replacement operation on the object `x`. When you call `\`names<-`(x,y)`, the parser sees no special lhs, and so does not schedule a replacement operation. The function is called identically in both situations, and the function does not do anything differently, it is the R parser's intentions that matter. – r2evans Dec 21 '20 at 15:49
  • @GKi That's some strange wording. I can't even tell what "_The replacement function has the same name with <- pasted on_" is talking about. **The** replacement function? – J. Mini Dec 21 '20 at 23:29
  • @nicola GKi's link is close to making that point for you, but I can't quite find where the documentation explains what you're saying. [The names documentation](https://stat.ethz.ch/R-manual/R-devel/library/base/html/names.html) appears to make no mention of this sugar. Is there somewhere in the documentation that does such a good job of explaining what "`names<-` _is a generic replacement function_" means that the fact that this is syntactic sugar becomes obvious? – J. Mini Dec 21 '20 at 23:35

2 Answers2

3

Short answer: names(x)<-y is actually sugar for x<-"names<-"(x,y) and not just "names<-"(x,y). See the the R-lang manual, pages 18-19 (pages 23-24 of the PDF), which comes to basically the same example.

For example, names(x) <- c("a","b") is equivalent to:

`*tmp*`<-x
x <- "names<-"(`*tmp*`, value=c("a","b"))
rm(`*tmp*`)

If more familiar with getter/setter, one can think that if somefunction is a getter function, somefunction<- is the corresponding setter. In R, where each object is immutable, it's more correct to call the setter a replacement function, because the function actually creates a new object identical to the old one, but with an attribute added/modified/removed and replaces with this new object the old one.

In the case example for instance, the names attribute are not just added to x; rather a new object with the same values of x but with the names is created and linked to the x symbol.

Since there are still some doubts about why the issue is discussed in the language doc instead directly on ?names, here is a small recap of this property of the R language.

  • You can define a function with the name you wish (there are some restrictions of course) and the name does not impact in any way if the function is called "normally".
  • However, if you name a function with the <- suffix, it becomes a replacement function and allows the parser to apply the function with the mechanism described at the beginning of this answer if called by the syntax foo(x)<-value. See here that you don't call explicitely foo<-, but with a slightly different syntax you obtain an object replacement (since the name).
  • Although there are not formal restrictions, it's common to define getter/setter in R with the same name (for instance names and names<-). In this case, the <- suffix function is the replacement function of the corresponding version without suffix.
  • As stated at the beginning, this behaviour is general and a property of the language, so it doesn't need to be discussed in any replacement function doc.
nicola
  • 24,005
  • 3
  • 35
  • 56
  • That's some strange wording. I can't even tell what "The replacement function has the same name with <- pasted on" is talking about. **The** replacement function? And what the heck is the documentation for `"names<-"(x,value)` doing outside of the documentation for `names(x)<-value`? – J. Mini Dec 22 '20 at 10:38
  • What you've added is insightful, but I can't see the relevance to my comment. – J. Mini Dec 22 '20 at 19:57
  • So, try to explain better. What's your issue precisely? I have shown what a replacement function is in R and where it is documented. – nicola Dec 22 '20 at 20:21
  • My comment asks two questions: What does the documentation mean by "**the** replacement function" and why is the documentation `"names<-"` not in the same place as that of `names(x)<-value`? Your edit covers the general notion of replacement functions, but I'm specifically asking the word "the". You've covered what replacement functions are, but I don't know what the docs means by "**The** replacement function...". As for my documentation issue, I'm trying to express surprise about how I apparently need to go to the language definition instead of `?names`. My question is: "is this normal?". – J. Mini Dec 22 '20 at 21:12
  • 1
    @J.Mini “the replacement function [corresponding to a function `f`]” is “the function \`f<-\`”. So \`names<-\` is the replacement function of `names` — And there’s only *one* documentation for \`names<-\`, it’s all in one place. What is in a *different* place (quoted in this answer) is R’s documentation of the *syntax* for replacement functions in general. – Konrad Rudolph Dec 23 '20 at 09:40
  • It’s worth noting that using quotes around replacement functions is *discouraged*, even though the R manual itself unfortunately, confusingly uses them here. Please always use backticks to avoid confusion with actual strings, i.e. write \`names<-\`, not "names<-" (the R documentation of `?Quotes` confirms that this is preferred). The fact that straight quotes even work at all here is a historic artefact and, in hindsight, a mistake. – Konrad Rudolph Dec 23 '20 at 09:42
  • @KonradRudolph Gosh. All of this would've been much simpler if the manual just used the word "corresponding". Thanks for that. – J. Mini Dec 23 '20 at 10:52
1

In particular, one sees that names(x)<-y actually changes the names of x whereas "names<-"(z,y) returns z with its names changed.

That’s because `names<-`1 is a regular function, albeit with an odd name2. It performs no assignment, it returns a new object with the names attribute set. In fact `names<-` is a primitive function in R but it could be implemented as follows (there are shorter, better ways of writing this in R, but I want the separate steps to be explicit):

`names<-` = function (x, value) {
    new = x
    attr(new, 'names') = value
    new
}

That is, it

  • … creates a new object that’s a copy of x,
  • … sets the names attribute on that newly created object, and
  • … returns the new object.

Since virtually all objects in R are immutable, this fits naturally into R’s semantics. In fact, a better name for this exact function would be with_names3. But the creators of R found it convenient to be able to write such an assignment without repeating the name of the object. So instead of writing

x = with_names(x, c('foo', 'bar'))

or

x = `names<-`(x, c('foo', 'bar'))

R allows us to write

names(x) = c('foo', 'bar')

R handles this syntax specially by internally converting it to another expression, documented in the Subset assignment section of the R language definition, as explained in the answer by Nicola.

But the gist is that names(x) = y and `names<-`(x, y) are different because … they just are. The former is a special syntactic form that gets recognised and transformed by the R parser. The latter is a regular function call, and the weird function name is a red herring: it doesn’t affect the execution whatsoever. It does the same as if the function was named differently, and you can confirm this by assigning it a different name:

with_names = `names<-`
`another weird(!) name` = `names<-`

# These are all identical:

`names<-`(x, y)
with_names(x, y)
`another weird(!) name`(x, y)

1 I strongly encourage using backtick quotes (`) instead of straight quotes (' or ") to quote R variable names. While both are allowed in some circumstances, the latter invites confusion with strings, and is conceptually bonkers. These are not strings. Consider:

"a" = "b"
"c" = "a"

Rather than copy the value of a into c, what this code actually does is set c to literal "a", because quotes now mean different things on the left- and right-hand side of assignment.

The R documentation confirms that

The preferred quote [for variable names] is the backtick (`)

2 Regular variable names (aka “identifiers” or just “names”) in R can only contain letters, digits, underscore and the dot, must start with a letter, or with a dot not followed by a digit, and can’t be reserved words. But R allows using pretty much arbitrary characters — including punctuation and even spaces! — in variable names, provided the name is backtick-quoted.

3 In fact, R has an almost-alias for this function, called setNames — which isn’t a great name, since set… implies mutating the object, but of course it doesn’t do that.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214