0

Please help me understand this rather weird data.frame behavior. When I use the <- operator I get a different and unexpected column name:

x <- data.frame (y <- 1)
a <- data.frame (b = 1)

> colnames(x)
[1] "y....1"
> colnames(a)
[1] "b"

I know the difference between the operators:

> b
Error: object 'b' not found
> y
[1] 1

In this answer the behavior is mentioned. In the comments, data.frame calling make.names("y <- 1") is given as an explanation. I don't get this reasoning. Is is just a bug which should be removed in the future?

Cœur
  • 37,241
  • 25
  • 195
  • 267
PascalIv
  • 595
  • 7
  • 21
  • Please take a look at the linked post and the extended discussions therein; happy to re-open if this does *not* answer your question. – Maurits Evers Jul 31 '18 at 07:41
  • From `?data.frame` you can see that `data.frame` takes arguments of either the form `value` or `tag = value`. So `tag <- value` will be interpreted as a single (character) value (with the special characters replaced by ellipsis). – Maurits Evers Jul 31 '18 at 07:47
  • It is mentioned there, but no full explanation is given. – PascalIv Jul 31 '18 at 07:54
  • @MauritsEvers If the whole expression gets interpreted as character, why does it still have the value 1? x <- data.frame ("y <- 1") is different from x <- data.frame (y <- 1) – PascalIv Jul 31 '18 at 07:59
  • Did you take a look at `?data.frame`? To cite from the explanation of the `...` argument: *"these arguments are of either the form ‘value’ or ‘tag = value’. Component names are created based on the tag (if present) or **the deparsed argument itself**"* (bold face mine). So if you do `data.frame(y <- 1)` (which you really shouldn't), `y <- 1` first gets deparsed (which stores value 1 for `y`, c.f. `deparse(y <- 1)`), and then stores the return object of `deparse` as a value (since you don't have a `tag = value` argument). – Maurits Evers Jul 31 '18 at 11:32
  • [continued] `data.frame` then does some automatic variable type and column name guessing. It might be enlightening to take a look at the source code of `data.frame` (just type `data.frame` + Enter into an R terminal). It's really not very mysterious (there exist much more diffuse R idiosyncrasies), and it's definitely not a bug, nor is this something that "should be removed in the future". In the end it boils down to you doing something that you *shouldn't* do (i.e. `data.frame(y <- 1)`). – Maurits Evers Jul 31 '18 at 11:43
  • 1
    Thank you for the perfect explanation. It is now clear to me what happens. I think I am going to switch to Python for today – PascalIv Jul 31 '18 at 12:15
  • You're very welcome and don't give up on R;-) @Pascallv. Enjoy your Python break. – Maurits Evers Jul 31 '18 at 12:17

0 Answers0