0

My input file is simple tab-delimited data with 3 columns

cat input

1\thello dolly\t1
2\t#hi\t2

A part of R script:

...
input<-read.table(IN,header=FALSE,sep="\t",quote="",nrows=1,fileEncoding="UTF-8")
message(i,":line; col=", ncol(row));
...

When the input file is processed by the R script, I got error in 2. line:

1.line; col=3
2.line; col=2

However, when '#' is removed from the input I got correct result:

1.line; col=3
2.line; col=3

Why # character influence number of columns - isn't it a bug? Other programs, like awk, always gave the correct number of columns

xhudik
  • 2,414
  • 1
  • 21
  • 39
  • 5
    You should probably read the documentation before assuming something is a bug. See the `comment.char` argument to `read.table`. – joran May 21 '14 at 15:51
  • Thanks @joran - you are right (upvoted). Having '#' set as a default for comments is pretty strange (at least for me). Reading overall documentation for each command in each programming language you use would kill efficiency completely - don't you think? – xhudik May 21 '14 at 16:06
  • 1
    On the subject of `comment.char` and `read.table` it's also worth knowing that using `comment.char=""`, as well as fixing your problem, should make the `read.table` complete faster (see `?read.table`), so it's a useful argument to add if you're using a large file that doesn't contain comments. – ping May 21 '14 at 16:14
  • 3
    I agree that that may be a questionable default. However, it is also _very_ old, in a function that is about as ubiquitous as they get. So changing it would break lots and lots and lots of stuff. Also, note the existence of specific `read.csv` and `read.delim` functions that use sensible defaults for specific file types. If you choose to use the most general function, it's up to you to read the docs! :) – joran May 21 '14 at 16:14

0 Answers0