24

Consider the following:

df <- data.frame(a = 1, b = 2, c = 3)
names(df[1]) <- "d" ## First method
##  a b c
##1 1 2 3

names(df)[1] <- "d" ## Second method
##  d b c
##1 1 2 3

Both methods didn't return an error, but the first didn't change the column name, while the second did.

I thought it has something to do with the fact that I'm operating only on a subset of df, but why, for example, the following works fine then?

df[1] <- 2 
##  a b c
##1 2 2 3
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • 1
    I like Joshua's answer here: http://stackoverflow.com/a/10449502/1270695. He says it's magic. – A5C1D2H2I1M1N2O1R2T1 May 02 '14 at 12:20
  • Joshua's answer refers to the second example, not to the one that didn't do anything... – David Arenburg May 02 '14 at 12:24
  • 3
    Ohboyohboy, are you in for a treat: read this (in)famous monograph -- http://www.burns-stat.com/pages/Tutor/R_inferno.pdf – Carl Witthoft May 02 '14 at 13:12
  • 2
    @DavidArenburg -- For the benefit of future readers, would you please consider changing your "accept" from gagolews' to BrodieG's answer? (As you can see from comments below, gagolews has even tried to delete his own answer, which isn't optimal. Much better would be for him to leave it, but for you to switch the accept.) – Josh O'Brien May 02 '14 at 18:18

1 Answers1

28

What I think is happening is that replacement into a data frame ignores the attributes of the data frame that is drawn from. I am not 100% sure of this, but the following experiments appear to back it up:

df <- data.frame(a = 1:3, b = 5:7)
#   a b
# 1 1 5
# 2 2 6
# 3 3 7

df2 <- data.frame(c = 10:12)
#    c
# 1 10
# 2 11
# 3 12

df[1] <- df2[1]   # in this case `df[1] <- df2` is equivalent

Which produces:

#    a b
# 1 10 5
# 2 11 6
# 3 12 7

Notice how the values changed for df, but not the names. Basically the replacement operator `[<-` only replaces the values. This is why the name was not updated. I believe this explains all the issues.

In the scenario:

names(df[2]) <- "x"

You can think of the assignment as follows (this is a simplification, see end of post for more detail):

tmp <- df[2]
#   b
# 1 5
# 2 6
# 3 7

names(tmp) <- "x"
#   x
# 1 5
# 2 6
# 3 7

df[2] <- tmp   # `tmp` has "x" for names, but it is ignored!
#    a b
# 1 10 5
# 2 11 6
# 3 12 7

The last step of which is an assignment with `[<-`, which doesn't respect the names attribute of the RHS.

But in the scenario:

names(df)[2] <- "x"

you can think of the assignment as (again, a simplification):

tmp <- names(df)
# [1] "a" "b"

tmp[2] <- "x"
# [1] "a" "x"

names(df) <- tmp
#    a x
# 1 10 5
# 2 11 6
# 3 12 7

Notice how we directly assign to names, instead of assigning to df which ignores attributes.

df[2] <- 2

works because we are assigning directly to the values, not the attributes, so there are no problems here.


EDIT: based on some commentary from @AriB.Friedman, here is a more elaborate version of what I think is going on (note I'm omitting the S3 dispatch to `[.data.frame`, etc., for clarity):

Version 1 names(df[2]) <- "x" translates to:

df <- `[<-`(
  df, 2, 
  value=`names<-`(   # `names<-` here returns a re-named one column data frame
    `[`(df, 2),       
    value="x"
) ) 

Version 2 names(df)[2] <- "x" translates to:

df <- `names<-`(
  df,
  `[<-`(
     names(df), 2, "x"
) )

Also, turns out this is "documented" in R Inferno Section 8.2.34 (Thanks @Frank):

right <- wrong <- c(a=1, b=2)
names(wrong[1]) <- 'changed'
wrong
# a b
# 1 2
names(right)[1] <- 'changed'
right
# changed b
# 1 2
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
BrodieG
  • 51,669
  • 9
  • 93
  • 146