2

This is related to Assignment operators in R: '=' and '<-'; however, my question is not answered there.

The linked question and answers explain that using <- inside of a function declares the variable assignment in the user workspace, so that the variable can be used after the function is called. (Ed note: that is not actually stated in the linked answer, and if it were stated, it would be wrong. If you made the statement about the evaluation of argument lists and restricted it to calls of such functions from the global environment it might be correct.)

This would seem to explain the following difference in behavior. This following code produces a data frame exactly as one might expect:

A <- data.frame(
  Sub = rep(c(1:3),each=3),
  Word = rep(c('Hap','Lap','Sap'),3),
  Vowel_Length = sample(c(1:100),9)
  )

The result is:

  Sub Word Vowel_Length
1   1  Hap           31
2   1  Lap            2
3   1  Sap           71
4   2  Hap           58
5   2  Lap           28
6   2  Sap           20
7   3  Hap           78
8   3  Lap           72
9   3  Sap           77

However, if we use <- inside of the data.frame() function, as follows, we get a different result.

B <- data.frame(
  Sub <- rep(c(1:3),each=3),
  Word <- rep(c('Hap','Lap','Sap'),3),
  Vowel_Length <- sample(c(1:100),9)
  )

This result is:

  Sub....rep.c.1.3...each...3. Word....rep.c..Hap....Lap....Sap....3.
1                            1                                    Hap
2                            1                                    Lap
3                            1                                    Sap
4                            2                                    Hap
5                            2                                    Lap
6                            2                                    Sap
7                            3                                    Hap
8                            3                                    Lap
9                            3                                    Sap
  Vowel_Length....sample.c.1.100...9.
1                                  31
2                                  15
3                                   4
4                                   2
5                                  89
6                                  55
7                                  12
8                                  72
9                                  47

I assume that, because using <- inside a function declares the variable globally, then the headers of the data frame are inherited from that global declaration, just as the linked question and answers would seem to indicate. [See the comments.]

However, I'm curious why you get, for example, Sub....rep.c.1.3...each...3. as the header of the first column in the data frame instead of Sub <- rep(c(1:3),each=3),, or even instead of 1 1 1 2 2 2 3 3 3.

Update:

As @AnandaMahto pointed out in a deleted comment, setting check.names to FALSE produces the following behavior.

C <- data.frame(
  Sub <- rep(c(1:3),each=3),
  Word <- rep(c('Hap','Lap','Sap'),3),
  Vowel_Length <- sample(c(1:100),9),
  check.names=FALSE
)

Where the result is:

  Sub <- rep(c(1:3), each = 3) Word <- rep(c("Hap", "Lap", "Sap"), 3)
1                            1                                    Hap
2                            1                                    Lap
3                            1                                    Sap
4                            2                                    Hap
5                            2                                    Lap
6                            2                                    Sap
7                            3                                    Hap
8                            3                                    Lap
9                            3                                    Sap
  Vowel_Length <- sample(c(1:100), 9)
1                                  15
2                                   3
3                                  82
4                                  33
5                                  99
6                                  53
7                                  89
8                                  77
9                                  47

And to clarify, my question is simply why this behavior is happening. In particular, why do you get Sub....rep.c.1.3...each...3. as a header instead of Sub <- rep(c(1:3),each=3), or 1 1 1 2 2 2 3 3 3 with check.names=TRUE.

And now, I suppose that I'm also curious why you get Sub <- rep(c(1:3),each=3), as the header with check.names=FALSE?

Community
  • 1
  • 1
Adam Liter
  • 875
  • 2
  • 10
  • 30
  • You get the original expression as a name due to lazy evaluation, which makes it possible for the `data.frame` function to capture the expression "string" prior to its evaluation. BTW, the names with `check.names=TRUE` come from the same strings massaged by `make.names`. Link: http://adv-r.had.co.nz/Computing-on-the-language.html – Ferdinand.kraft Sep 04 '13 at 21:09
  • 1
    This is wrong: "using <- inside a function declares the variable globally". – IRTFM Sep 05 '13 at 00:41
  • @DWin telling me what is 'wrong' about it might be helpful so that I can edit the question so that it doesn't contain false information. Or, if you want to, feel free to edit the question yourself. – Adam Liter Sep 05 '13 at 00:47
  • @Adam, `<-` does not declare a variable globally. For example: `f <- function(x) { a <- 1; return(x) }`. Try calling `f()` and you'll see variable `a` does not exist. – Scott Ritchie Sep 05 '13 at 00:50
  • @Manetheran thanks for providing a bit more of an explanation. – Adam Liter Sep 05 '13 at 00:59
  • @DWin: would it be correct to say that using `<-` in the parameter of a function call declares the variable in the calling environment? I think that's the sense that Adam perhaps meant. – Aaron left Stack Overflow Sep 05 '13 at 01:01
  • I think I've found the source of your confusion there. If you then do: `f(x = 3)`, the variable `x` won't exist (unless it does already). However if you do `f(x <- 3)`, `x` will be created in your current environment/stack. – Scott Ritchie Sep 05 '13 at 01:02
  • @Manetheran Yes, that's what I took away from the linked question ([Assignment operators in R: '=' and '<-'](http://stackoverflow.com/questions/1741820/assignment-operators-in-r-and)). I guess I mis-paraphrased the answer given to that question. I'm a newcomer to `R`, so I'm still learning. – Adam Liter Sep 05 '13 at 01:04
  • Within a function, assignment with `<-` or any of the methods such as `[<-` or `[[<-` will only create a named value in the local environment of that function. Any function called from within that function after the assignment will be able to see those named values, but after the function is exited, the named value will be garbage collected (unless it was the last evaluated expression or `return()`-ed) and will not exist in what most people would call the "user workspace". – IRTFM Sep 05 '13 at 01:27
  • @DWin Thanks for clarifying! =] I appreciate it. Like I said, I'm new to `R` and really just trying to learn *why* these things are happening so that I have a better understanding of the language. – Adam Liter Sep 05 '13 at 01:32
  • The evaluation model is called "lexical scoping". The behaviors of `<-` and `-` are different inside argument lists of functions as you were demonstrating, but generally are the same elsewhere. – IRTFM Sep 05 '13 at 01:33
  • I also see that the phrase "inside a function" is ambiguous and that your interpretation the included expressions "evaluated in the argument list" is arguably as correct as my interpretation meaning "within the body of the function." Take a look at `?alist` and `?formals`. – IRTFM Sep 05 '13 at 01:41

3 Answers3

4

It appears that your question is about the strange naming that R ends up using, and you're wondering why it doesn't have spaces, <, and so on.

If that's your actual question, you should look at the check.names argument in data.frame.

From ?data.frame:

check.names logical. If TRUE then the names of the variables in the data frame are checked to ensure that they are syntactically valid variable names and are not duplicated. If necessary they are adjusted (by make.names) so that they are.

Thus, you can get the names you were expecting by setting check.names to FALSE:

B <- data.frame( Sub <- rep(c(1:3),each=3), 
                 Word <- rep(c('Hap','Lap','Sap'),3), 
                 Vowel_Length <- sample(c(1:100),9),
                 check.names = FALSE)
B
#   Sub <- rep(c(1:3), each = 3) Word <- rep(c("Hap", "Lap", "Sap"), 3)
# 1                            1                                    Hap
# 2                            1                                    Lap
# 3                            1                                    Sap
# 4                            2                                    Hap
# 5                            2                                    Lap
# 6                            2                                    Sap
# 7                            3                                    Hap
# 8                            3                                    Lap
# 9                            3                                    Sap
#   Vowel_Length <- sample(c(1:100), 9)
# 1                                  33
# 2                                  20
# 3                                   5
# 4                                  83
# 5                                  99
# 6                                  79
# 7                                  58
# 8                                  46
# 9                                  44
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • This answers part of the question, though it's really `?make.names` that provides the relevant information: "A syntactically valid name consists of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number. [. . .] All invalid characters are translated to '.'." However, it still doesn't answer why `data.frame()` is taking the entire string to be the header of the column. If anything, I would have expected `1 1 1 2 2 2 3 3 3` to be the column header. Why doesn't that happen? – Adam Liter Sep 05 '13 at 00:52
  • @Adam, the answer is *lazy evaluation*. The expression `Sub <- rep(c(1:3), each = 3)` is passed to the function `data.frame()` as is, and this function captures it to generate a text representation via `deparse()`. Only when the values are to be filled within the dataframe that the expression is evaluated, and the sequence `1 1 1 2 ...` is computed. – Ferdinand.kraft Sep 17 '13 at 21:10
  • @Ferdinand.kraft thanks for the further clarification. With the addition of your comment, I'll go ahead and accept this answer. Thank you both! – Adam Liter Jan 29 '14 at 00:45
3

In the first case you have a named list and in the second an unnamed one. Here's an illustrative example:

f = function(...) {
  l = list(...)
  print(names(l))
}

f(a = 4, b = 5)
#[1] "a" "b"

f(a <- 4, b <- 5)
#NULL

From that point on, if the list is unnamed, data.frame figures out a naming strategy depending on what check.names is set to.

eddi
  • 49,088
  • 6
  • 104
  • 155
0

As other answers point out, you should read what check.names does.

However, you might also consider this: if you write data.frame( Sub = rep(c(1:3)) ) the equals sign means "bind a named argument called Sub". That named argument stays internal to the call of data.frame().

If you write data.frame( Sub <- rep(c(1:3)) ) the <- sign means "assign into the local environment". So you create a variable called Sub in addition to creating the data.frame. That Sub variable remains present in your environment beyond the lifetime of the data.frame() call.

Steve Pitchers
  • 7,088
  • 5
  • 41
  • 41