46

Let's say I have a data.frame, like so:

x <- c(1:10,1:10,1:10,1:10,1:10,1:10,1:10,1:10,1:10,1:10)
df <- data.frame("Label 1"=x,"Label 2"=rnorm(100))

head(df,3)

returns:

  Label.1    Label.2
1       1  1.9825458
2       2 -0.4515584
3       3  0.6397516

How do I get R to stop automagically replacing the space with a period in the column name? ie, "Label 1" instead of "Label.1".

smci
  • 32,567
  • 20
  • 113
  • 146
Brandon Bertelsen
  • 43,807
  • 34
  • 160
  • 255

4 Answers4

96

You may set check.names = FALSE in data.frame (as well as in read.table):

df <- data.frame("Label 1" = 1:3, "Label 2" = rnorm(3), check.names = FALSE)

returns:

  Label 1    Label 2
1       1  0.2013347
2       2  1.8823111
3       3 -0.5233811

From ?data.frame:

check.names
logical. If TRUE then the names of the variables in the data frame are checked to ensure that they are syntactically valid variable names and are not duplicated. If necessary they are adjusted (by make.names) so that they are.


From ?make.names:

A syntactically valid name consists of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number. Names such as ".2way" are not valid, and neither are the reserved words.

All invalid characters are translated to "."


Also, if you need to subset a variable with an 'invalid' name using $, you can use backticks `. For example:

df$`Label 1`
Community
  • 1
  • 1
Brandon Bertelsen
  • 43,807
  • 34
  • 160
  • 255
13

You don't.

With the space you desire the format would not satisfy the requirements for an identifier that come to play when you use df$column.1 -- that could not cope with a space. So see the make.names() function for details or an example:

> make.names(c("Foo Bar", "tic tac"))
[1] "Foo.Bar" "tic.tac"  
>                                              

Edit eleven years later: The answer still stands that R prefers column names can be valid variable names. But R is flexible: if you insist you can use the other form _but then need to require the not-otherwise-valid-within-the-language column names explicitly:

> x <- c(1:10,1:10,1:10,1:10,1:10,1:10,1:10,1:10,1:10,1:10)
> df <- data.frame("Label 1"=x,"Label 2"=rnorm(100), check.names=FALSE)
> summary( df$`Label 2` )
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-2.2719 -0.7148 -0.0971 -0.0275  0.6559  2.5820 
> 

So by saying check.names=FALSE we override the default (and sensible) check, and by wrapping the identifier in backticks we can access the column.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • Hrmmm, this is for output purposes. The data.frame will not be used for further calculations at this point (ie, it's going straight to write.table()) – Brandon Bertelsen Aug 05 '10 at 01:59
  • It's a language requirement. You can create your own pretty printing functions that do the substitution *for output* but you cannot change the way the data.frame is created. – Dirk Eddelbuettel Aug 05 '10 at 02:16
  • 2
    @Brandon, you can specify `col.names` in `write.table`. Something like `col.names=gsub("\\."," ",colnames(df))` should do the trick. – Joshua Ulrich Aug 05 '10 at 02:19
  • 2
    Agree with the above comments. If it's for formatted output, then specify the space as part of the output process. Spaces in identifiers is just asking for trouble which is why they are discouraged/disallowed. – neilfws Aug 05 '10 at 02:40
  • 4
    I downvoted this a long time ago. But it's proved to be one of the "gotchas" that's worked it's way into my historical code (it causes all sorts of 'other' problems). So, it's getting the checkmark so passerby learn from my mistake. – Brandon Bertelsen Mar 27 '12 at 06:21
  • 2
    I realize this is very old at this point, but I needed the same thing for a table in a knitr report and while I understand "you don't", I've found that for the purposes of a report I need nicely formated labels. The gsub thing sort of works for me. I've upvoted your answer below accordingly. I believe it is the real answer to your question. – bhive01 Sep 14 '15 at 17:19
  • 1
    It is not true that it is a "requirement" to have no spaces in the names, however recommended. You create them with `check.names = F` as described, and access as `df$\`column.1\`` . Akin to double quotes in SQL for names that would clash with keywords, etc. – Daniel Sparing Oct 11 '17 at 07:30
5

You can change an existing data frames names to contain spaces ie using your example

x <- c(1:10,1:10,1:10,1:10,1:10,1:10,1:10,1:10,1:10,1:10)
df <- data.frame("Label 1"=x,"Label 2"=rnorm(100))
colnames(df) <- c("Label 1", "Label 2")
head(df, 3)

returns

  Label 1    Label 2
1       1  0.2013347
2       2  1.8823111
3       3 -0.5233811

and you can still access the columns using the $ operator, you just need to use double quotes eg

df$"Label 2"[1:3]

returns

[1]  0.2013347  1.8823111 -0.5233811

It seems rather inconsistent to me to auto-convert column names upon data.frame creation, but not to-do the same during column name alteration, but thats how R works at the moment.

Aaron Statham
  • 2,048
  • 1
  • 15
  • 16
1
names(df)<-c('Label 1','Label 2)
double-beep
  • 5,031
  • 17
  • 33
  • 41
Tanmay
  • 11
  • 1