Prevent R from splitting columns to rows during data import

Question

I have a data file with 21 rows but the no. of columns in each row are different. The first column is the header and remaining columns have numbers.

Full text file is here.

So, I import the file using: d <- read.table("data.txt", sep = " ", fill=T). But when I looked at the data using view(), I see that one row with a lot of columns (160,000+) is split into 3 rows. A picture showing this phenomenon is here.

Why is R doing this? And how can I fix it so I get 21 rows? I'd appreciate any help/pointers. I am using R Studio (64 bit) on Windows 7 with 16 GB memory.

Also, I did look around before posting but did not have much luck. The 'reshape' package seemed at first to be of some help but I couldn't really use it to suit my needs. Any tip to fix the issue during import or post-import would be appreciated. Thanks.

It would be useful if you posted `dput(d)`, so we can import your dataset, instead of making a screenshot of your data. — KenHBS, Sep 12 '16 at 18:08
It's a pretty long file. Instead, I've added a link to the text file. — berge2015, Sep 12 '16 at 18:15
Seems related to [this](http://stackoverflow.com/questions/1874443/import-data-into-r-with-an-unknown-number-of-columns), which is about how `read.table` decided the max number of columns. Some answers there that you can try. — aosmith, Sep 12 '16 at 18:30
You can try http://stackoverflow.com/questions/12626637/reading-a-text-file-in-r-line-by-line approach also and read the lines to a list and then try reshaping data according to your requirement. — tushaR, Sep 12 '16 at 19:25

score 0 · Answer 1 · edited May 23 '17 at 12:32

The reason why this appears is that read.table only checks out the first 5 rows of test.txt to determine the maximum length of the rows. In your data, the ninth line exceeds that maximum and therefore shows some weird behaviour.

You can avoid this by telling read.table how many columns it should create by giving them names as in this answer. The number of elements per line in some can be found using count.fields:

# Find the number of elements per line in test.txt
perline <- count.fields("test.txt", sep = " ")
maxlength <- max(perline)

# Read in test.txt
d <- read.table("test.txt", sep = " ", 
                row.names = 1, col.names = 1:maxlength,
                fill = TRUE)

Nice! Telling it to use the max # of columns seemed to do exactly what wanted. Thanks Ken. Also thanks to @aosmith. — berge2015, Sep 12 '16 at 19:34

Prevent R from splitting columns to rows during data import

1 Answers1