Let's say we have a file name test.txt
which contains unknown number of columns:
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5 6 7 8
1 2 3 4 5
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
fill=T
fails when line 8 has more than 5 columns:
read.table('test.txt', header=F, sep='\t', fill=T)
results:
V1 V2 V3 V4 V5
1 1 2 3 4 5
2 1 2 3 4 5
3 1 2 3 4 5
4 1 2 3 4 5
5 1 2 3 4 5
6 1 2 3 4 5
7 1 2 3 4 5
8 1 2 3 4 5
9 6 7 8 NA NA
10 1 2 3 4 5
11 1 2 3 4 5
12 6 NA NA NA NA
13 1 2 3 4 5
14 6 NA NA NA NA
15 1 2 3 4 5
16 6 NA NA NA NA
But with skip=3
, everything works fine
read.table('test.txt', header=F, sep='\t', fill=T, skip=3)
We got what we expected:
V1 V2 V3 V4 V5 V6 V7 V8
1 1 2 3 4 5 NA NA NA
2 1 2 3 4 5 NA NA NA
3 1 2 3 4 5 NA NA NA
4 1 2 3 4 5 NA NA NA
5 1 2 3 4 5 6 7 8
6 1 2 3 4 5 NA NA NA
7 1 2 3 4 5 6 NA NA
8 1 2 3 4 5 6 NA NA
9 1 2 3 4 5 6 NA NA
Why would this happen? Was it because fill=T
only check the first 5 rows? Is there any way to work around this?