1

I imported an excel dataset in R but there is one column that its name changed a little bit, just as shown below:

enter image description here

The original name of the first column is "Id" but it seems to change a little bit. And I make sure I didn't change anything in my original dataset. Just imported it in R and opened, it looks like this. What happened? Thanks many in advance!

Chris
  • 87
  • 10
  • You can use `check.names = FALSE` while reading the dataset. There could be some special characters as column names that got converted witih the default `check.names = TRUE` – akrun Mar 20 '20 at 23:37
  • OK. So what does the check.names = TRUE mean? – Chris Mar 20 '20 at 23:43
  • 1
    I meant `read.csv("yourfile.csv", check.names = FALSE)`. while TRUE, it triggers the. `make.names` and `make.unique` function which does the checking of column names and change it if it finds anything suspicious or not conforming to the standard format – akrun Mar 20 '20 at 23:45
  • Thank you so much! Got that! – Chris Mar 20 '20 at 23:47
  • It looks like your file has a UTF byte order marker (BOM) that causes the funny name. Use the suggested fix here: https://stackoverflow.com/questions/21624796/read-a-utf-8-text-file-with-bom – MrFlick Mar 21 '20 at 02:36

1 Answers1

2

While reading the dataset use the check.names = FALSE in read.csv/read.table to prevent the checks of column names

dat <- read.csv("file.csv", check.names = FALSE)

check.names = TRUE (default option) triggers make.names and make.unique that changes the column names if it doesn't conform to standard format, i.e. it would append X at the beginning if the column names start with numbers...


If we check the source code of read.table

 ...

 if (check.names) 
        col.names <- make.names(col.names, unique = TRUE)

 ...

and make.names calls make.unique

make.names
function (names, unique = FALSE, allow_ = TRUE) 
{
    names <- as.character(names)
    names2 <- .Internal(make.names(names, allow_))
    if (unique) {
        o <- order(names != names2)
        names2[o] <- make.unique(names2[o])
    }
    names2
}
akrun
  • 874,273
  • 37
  • 540
  • 662
  • But my original column name is "Id", I think it's normal, there is no any reason triggering the make.names or something like that. – Chris Mar 20 '20 at 23:54
  • @Chris Can you read with `check.names = FALSE` and get the `dput(colnames(yourdata))` I think the character is a special character or may be there is a space before it – akrun Mar 20 '20 at 23:55
  • OK. Actually it's a question from my friend and I'm just curious. I will tell him and check if it works! – Chris Mar 21 '20 at 00:00