-2

Can any body explain why the below two data frames df1 and df2 are differing in their column names

  df1 <- data.frame(a = 1:5, b = 11:15)
  df1
  #   a  b
  # 1 1 11
  # 2 2 12
  # 3 3 13
  # 4 4 14
  # 5 5 15

  df2 <- data.frame(a <- 1:5, b <- 11:15)
  df2
  #   a....1.5 b....11.15
  # 1        1         11
  # 2        2         12
  # 3        3         13
  # 4        4         14
  # 5        5         15
Sowmya S. Manian
  • 3,723
  • 3
  • 18
  • 30
  • 2
    In the first case you're passing named `...` arguments. In the second you are passing the result of a function and R `deparse`s the expression to name the column. The second case is similar to a call like `data.frame(1:5 + sum(3^2 - 10:3) - 28.8)`. R trying to make valind "names" uses something like `make.names(deparse(substitute(a <- 1:5)))`. Also, in the second case you've assigned a `a` and `b` object in your `.GlobalEnv` – alexis_laz Sep 12 '16 at 10:52

2 Answers2

1

If you want to have the column names as a, b, the correct syntax should be

data.frame(a=1:5, b=1:5)

The statement

data.frame(a <- 1:5, b <- 1:5)

R interprets it as if no column names are provided, so it treats the entire expression 'a <- 1:5' as the first column name, but there are 2 spaces and 3 illegal characters '<', '-', ':' that are not allowed in a column name, so each of them is changed to the character '.', hence you get the entire 'a....1.5' as the first column name, same goes for the second column.

Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63
-1

The <- operator not only assigns objects but creates them in the parent environment unlike = operator

Renaming columns slightly:

df1 <- data.frame(a1 = 1:5, b1 = 11:15)



df1
#   a1  b1
# 1 1 11
# 2 2 12
# 3 3 13
# 4 4 14
# 5 5 15


#The objects are only created in the dataframee but not in environment  
# > exists(x = "a1")
#[1] FALSE
#> exists(x = "b1")
#[1] FALSE


#The objects are not only in created in the dataframe as well as in the environment

df2 <- data.frame(a2 <- 1:5, b2 <- 11:15)
df2
#   a2....1.5 b2....11.15
# 1        1         11
# 2        2         12
# 3        3         13
# 4        4         14
# 5        5         15

# > exists(x = "a2")
#[1] TRUE
#> exists(x = "b2")
#[1] TRUE
Silence Dogood
  • 3,587
  • 1
  • 13
  • 17
  • 2
    This is useful, but it is already provided in the other question I referenced in the comment. It would be more informative if you included an explanation about how `data.frame` treats the call to get column names. – jakub Sep 12 '16 at 11:04