1

I am trying to read.table for a tab-delimited file using the following command:

df <- read.table("input.txt", header=FALSE, sep="\t", quote="", comment.char="", 
                 encoding="utf-8")

There should be 30 million rows. However, after read.table() returns, df contains only ~6 million rows. And there is a warning message like this:

Warning message:

In read.table("input.txt", header = FALSE, sep = "\t", quote = "",  :
  incomplete final line found by readTableHeader on 'input.txt'

I believe read.table quits after encountering a special sympbol (ASCII code: 1A Substitute) in one of the string columns. In the input file, the only special character is tab because it is used to separate columns. Is there anyway to ask read.table to treat any other character as not special?

gung - Reinstate Monica
  • 11,583
  • 7
  • 60
  • 79
Li-wei He
  • 11
  • 1

1 Answers1

0

If you have 30 million rows. I would use fread rather than read.table. It is faster. Learn more about here http://www.inside-r.org/packages/cran/data.table/docs/fread

 fread(input, sep="auto", encoding = "UTF-8" )

Regarding your issue with read.table. I think the solutions here should solve it. 'Incomplete final line' warning when trying to read a .csv file into R

Community
  • 1
  • 1
user5249203
  • 4,436
  • 1
  • 19
  • 45