0

These four ways of creating a dataframe look pretty similar to me:

myData1 <- data.frame(a <- c(1,2), b <- c(3, 4))
myData2 <- data.frame(a = c(1,2), b = c(3,4))
myData3 <- data.frame(`<-`(a,c(1,2)), `<-`(b,c(3, 4)))
myData4 <- data.frame(`=`(a,c(1,2)), `=`(b,c(3,4)))

But If I print out the column names, I only get the nice column names that I would hope for if I use the = operator. In all the other cases, the whole expression becomes the column name, with all the non-alphanumerics replaced by periods:

> colnames(myData1)
[1] "a....c.1..2." "b....c.3..4."
> colnames(myData2)
[1] "a" "b"
> colnames(myData3)
[1] "a....c.1..2." "b....c.3..4."
> colnames(myData4)
[1] "a...c.1..2." "b...c.3..4."

I've read about differences between <- and = when used in function calls in terms of variable scope, but as far as I can reason (possibly not very far), that doesn't explain this particular behavior.

  1. What accounts for the difference between = and <-?
  2. What accounts for the difference between the prefix and infix versions of =?
Community
  • 1
  • 1
sudo make install
  • 5,629
  • 3
  • 36
  • 48
  • 5
    Well, to be fair, your examples 1, 3 and 4 are very rare, bordering on pathological. – joran Aug 19 '15 at 19:58
  • Haha I love that description. I added examples 3 and 4 as a curiosity after finding the linked post, which says they should be equivalent. But the first example is what I typed naturally (read: naively). Could you explain what's wrong with them? – sudo make install Aug 19 '15 at 20:02
  • The root of the "problem" is `list(a <- 1:2)`, but that descends to R's internals almost immediately. – joran Aug 19 '15 at 20:09

2 Answers2

2

When you call a function, including data.frame, = is not used as an assignment operator. It simply marks relationships between given parameter and a variable you pass to the function.

Ignoring data.frame(a = c(1,2), b = c(3,4)), fore each of these calls <- and = are interpreted as normal assignments and create a and b variables in your environment.

> ls()
character(0)
> myData1 <- data.frame(a <- c(1,2), b <- c(3, 4))
[1] "a"       "b"       "myData1"
> rm(list=ls())
> ls()
character(0)
> myData3 <- data.frame(`<-`(a,c(1,2)), `<-`(b,c(3, 4)))
> ls()
[1] "a"       "b"       "myData3"
> rm(list=ls())
> ls()
character(0)
> myData4 <- data.frame(`=`(a,c(1,2)), `=`(b,c(3,4)))
> ls()
[1] "a"       "b"       "myData4"

Data frame get expected values only because <- and = return invisibly the argument.

> foo <- `=`(a,c(1,2))
> foo
[1] 1 2

Because of that your data.frame calls are equivalent, ignoring variable assignment side effect, to

> data.frame(c(1,2), c(3, 4))
  c.1..2. c.3..4.
1       1       3
2       2       4

hence the results you see.

zero323
  • 322,348
  • 103
  • 959
  • 935
2

When you offer a <- c(1,2) as an argument to data.frame, there will be a value for the first argument, but there will be no name in the formals list. The formals of a function are processed with as.list. Both a and c(1,2) were passed to <- and an element named a is returned and this results in there being no name in the arguments that got sent to as.list. You can think of the symbol a as having already been already processed and therefore "used up". The default names in that situation are the results of a deparsecall.

> make.names(deparse( quote(a <- c(1,2) )) )
[1] "a....c.1..2."
IRTFM
  • 258,963
  • 21
  • 364
  • 487