fill=TRUE will fail when different number of column occurr after 5 rows in read.table?

Question

Let's say we have a file name test.txt which contains unknown number of columns:

1   2   3   4   5
1   2   3   4   5
1   2   3   4   5
1   2   3   4   5
1   2   3   4   5
1   2   3   4   5
1   2   3   4   5
1   2   3   4   5   6   7   8
1   2   3   4   5
1   2   3   4   5   6
1   2   3   4   5   6
1   2   3   4   5   6

fill=T fails when line 8 has more than 5 columns:

read.table('test.txt', header=F, sep='\t', fill=T)

results:

   V1 V2 V3 V4 V5
1   1  2  3  4  5
2   1  2  3  4  5
3   1  2  3  4  5
4   1  2  3  4  5
5   1  2  3  4  5
6   1  2  3  4  5
7   1  2  3  4  5
8   1  2  3  4  5
9   6  7  8 NA NA
10  1  2  3  4  5
11  1  2  3  4  5
12  6 NA NA NA NA
13  1  2  3  4  5
14  6 NA NA NA NA
15  1  2  3  4  5
16  6 NA NA NA NA

But with skip=3, everything works fine

read.table('test.txt', header=F, sep='\t', fill=T, skip=3)

We got what we expected:

  V1 V2 V3 V4 V5 V6 V7 V8
1  1  2  3  4  5 NA NA NA
2  1  2  3  4  5 NA NA NA
3  1  2  3  4  5 NA NA NA
4  1  2  3  4  5 NA NA NA
5  1  2  3  4  5  6  7  8
6  1  2  3  4  5 NA NA NA
7  1  2  3  4  5  6 NA NA
8  1  2  3  4  5  6 NA NA
9  1  2  3  4  5  6 NA NA

Why would this happen? Was it because fill=T only check the first 5 rows? Is there any way to work around this?

According to `?read.table` `The number of data columns is determined by looking at the first five lines of input (or the whole file if it has less than five lines), or from the length of ‘col.names’ if it is specified and is longer. This could conceivably be wrong if ‘fill’ or ‘blank.lines.skip’ are true, so specify ‘col.names’ if necessary (as in the ‘Examples’).` — akrun, Aug 18 '15 at 07:27
Thank you for you quick response. I've found the anwser right in the Examples. — Gahoo, Aug 18 '15 at 07:32

score 5 · Answer 1 · answered Aug 18 '15 at 07:29

I've found the answers right in the Examples of read.table.

ncol <- max(count.fields('test.txt', sep = "\t"))
read.table('test.txt', header=F, sep='\t', fill=T, col.names=paste0('V', seq_len(ncol)))

It did because of fill=T only checks the first five rows. The solution is to specify col.names.

score 2 · Answer 2 · answered Aug 18 '15 at 07:28

2

use col.names = paste0("V",seq_len(N)) within read.table where N is the maximum number of columns.

answered Aug 18 '15 at 07:28

drmariod

11,106
16
64
110

fill=TRUE will fail when different number of column occurr after 5 rows in read.table?

2 Answers2

Linked