0

please see the the column name "if" in the second column,the deifference is :when check.name=F,"." beside "if" disappear

enter image description here

Sorry for the code,because I try to type some codes to generate this data.frame like in the picture,but i failed due to the "if".We know that "if" is a reserved word in R(like else,for, while ,function).And here, i deliberately use the "if" as the column name (the 2nd column),and see whether R will generate some novel things. So using another way, I type the "if" in the excel and save as the format of csv in order to use read.csv.

Question is: Why "if." changes to "if"?(After i use check.names=FALSE)

enter image description here

enter image description here

  • Welcome to Stack Overflow! Could you make your problem reproducible by sharing a sample of your data so others can help (please do not use `str()`, `head()` or screenshot)? You can use the [`reprex`](https://reprex.tidyverse.org/articles/articles/magic-reprex.html) and [`datapasta`](https://cran.r-project.org/web/packages/datapasta/vignettes/how-to-datapasta.html) packages to assist you with that. See also [Help me Help you](https://speakerdeck.com/jennybc/reprex-help-me-help-you?slide=5) & [How to make a great R reproducible example?](https://stackoverflow.com/q/5963269) – Tung Oct 31 '21 at 05:51
  • "Column names are often data, and the underlying make.names() transformation is non-invertible, so the default behaviour corrupts data. To avoid this, set check.names = FALSE." ———quoted from [link](https://advanced-r-solutions.rbind.io/names-and-values.html) ,this trigers my curiosity so i try some code above. Because I am afraid of the meeting the same problem when using some big data with millions of columns.And in that case,i will hardly find where goes wrong. Thank you so much! – user17234955 Oct 31 '21 at 07:34

1 Answers1

0

?read.csv describes check.names= in a similar fashion:

check.names: logical.  If 'TRUE' then the names of the variables in the
          data frame are checked to ensure that they are syntactically
          valid variable names.  If necessary they are adjusted (by
          'make.names') so that they are, and also to ensure that there
          are no duplicates.

The default action is to allow you to do something like dat$<column-name>, but unfortunately dat$if will fail with Error: unexpected 'if' in "dat$if", ergo check.names=TRUE changing it to something that the parser will not trip over. Note, though, that dat[["if"]] will work even when dat$if will not.

If you are wondering if check.names=FALSE is ever a bad thing, then imagine this:

dat <- read.csv(text = "a,a\n2,3")
dat
#   a a.1
# 1 2   3

dat <- read.csv(text = "a,a\n2,3", check.names = FALSE)
dat
#   a a
# 1 2 3

In the second case, how does one access the second column by-name? dat$a returns 2 only. However, if you don't want to use $ or [[, and instead can rely on positional indexing for columns, then dat[,colnames(dat) == "a"] does return both of them.

r2evans
  • 141,215
  • 6
  • 77
  • 149