2

I am trying to create a data.frame with column names specified. When I input the following:

df_ht <- data.frame("Teams" = NA, "Shots" = NA, "Shots On Target" = NA)

I get the following header:

              Teams Shots Shots.On.Target
1                NA    NA              NA

Then I put spaces next to the names to try spacing out the actual column names:

df_ht <- data.frame(" Teams " = NA, " Shots " = NA, " Shots On Target " = NA) 

And I got this:

          X.Teams. X.Shots. X.Shots.On.Target.
1           NA            NA                 NA

Why did the X and the . appear? How can I get rid of the .?

Ricardo Oliveros-Ramos
  • 4,322
  • 2
  • 25
  • 42
Concerned_Citizen
  • 6,548
  • 18
  • 57
  • 75
  • 3
    Read the documentation at `?data.frame` and pay particular attention to the `check.names` argument. Follow any links provided there. – joran Jan 14 '14 at 17:45
  • Found a solution at http://stackoverflow.com/questions/3411201/specifying-column-names-in-a-data-frame-changes-spaces-to. Need to set check.names to False. – Concerned_Citizen Jan 14 '14 at 17:57
  • 1
    Indeed, just as it says in the documentation. – joran Jan 14 '14 at 18:07

1 Answers1

4

When you create a data.frame, by default it has the option check.names = TRUE. This mean R will check the names provided are syntactically valid names, and they are created using make.names.

From ?make.names, a syntactically valid name:

[..] consists of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number. Names such as ".2way" are not valid, and neither are the reserved words.

Also, the invalid characters are replaced by dots.

If you do

df_ht <- data.frame(" Teams " = NA, " Shots " = NA,
                    " Shots On Target " = NA, check.names=FALSE)

you will get what you want, but this is not recommended at all. You will have a lot of complications to call or use the variables inside your data.frame (like need to use back ticks around the column/variable name or lost the autocompletion features). The purpose of the column names are to use them like in df_ht$Teams and are able to manipulate them, not to look well when printed.

Ricardo Oliveros-Ramos
  • 4,322
  • 2
  • 25
  • 42