I am trying to read.table for a tab-delimited file using the following command:
df <- read.table("input.txt", header=FALSE, sep="\t", quote="", comment.char="",
encoding="utf-8")
There should be 30 million
rows. However, after read.table() returns, df contains only ~6 million rows. And there is a warning message like this:
Warning message:
In read.table("input.txt", header = FALSE, sep = "\t", quote = "", :
incomplete final line found by readTableHeader on 'input.txt'
I believe read.table
quits after encountering a special sympbol (ASCII code: 1A Substitute)
in one of the string columns. In the input file, the only special character is tab because it is used to separate columns. Is there anyway to ask read.table to treat any other character as not special?