read.delim() - errors "more columns than column names" and "header and ''col.names" are of different lengths"

Question

Preliminary information OS: Windows XP Professional Version 2002 Service Pack 3; R version: R 2.12.2 (2011-02-25)

I am attempting to read a 30,000 row by 80 column, tab-delimited text file into R using the read.delim() function. This file does have column headers with following naming convention: "_". The code that I use to attempt to read the data in is:

cc <- c("integer", "character", "integer", rep("character", 3), 
        rep("integer", 73))

example_data <- read.delim(file = 'C:/example.txt', row.names = FALSE,
                           col.names = TRUE, as.is = TRUE, colClasses = cc)

After I submit this command, I receive the following error message:

Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
more columns than column names
In addition: Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote,  :
  header and 'col.names' are of different lengths

Information that may be important - from column 8 until column 80 the count of zeros in each column is as follows:

column 08: 29,000 zeros
column 13: 15,000 zeros
column 19: 500 zeros
column 43: 15,000 zeros
columns 65-80: 29,000 zeros for each column

Can anyone help identify reasons that I am receiving the above error messages? Any help will be greatly appreciated.

What does this return: `count.fields(file = 'C:/example.txt', sep="\t")[1:10]` ? — IRTFM, Sep 02 '11 at 13:47
@James: You are correct - cc is of length 79, which is the actual number of columns in my file. I rounded the dimensions in my post. — Jubbles, Sep 02 '11 at 13:51
@DWin: I've been using R for a few years, and I learn something new everyday. Thanks for introducing the `count.fields()` function to me. — Jubbles, Sep 02 '11 at 13:59
@Jubbles : you're welcome. I consider `count.fields` an essential part of the data input toolkit. It's also useful for identifying which lines have those "weird bits" like unmatched quotes or unexpected comment characters. — IRTFM, Sep 02 '11 at 14:44

score 7 · Accepted Answer · answered Sep 02 '11 at 13:48

7

The cause of the problem is your use of the col.names=TRUE argument. This is supposed to be used manually to specify column names for the resulting data frame, and therefore must be a vector with the same length as there are columns in the input, one name per column.

f you want read.delim to take column names from the file, consider using header=TRUE; you may also wish to reconsider row.names=TRUE as again this is intended as a specification of the row names rather than an instruction to read them from the file.

More information is available on the help page for read.delim.

answered Sep 02 '11 at 13:48

MatthewS

526
3
4

You are correct. I now feel a bit embarrassed by my question. I'm so used to writing data with `write.table(..., row.names = FALSE, col.names = TRUE, ...)` that I forgot that when **reading in** data, `col.names` specifies a vector of character types. – Jubbles Sep 02 '11 at 13:54
No need to feel embarrassed, that inconsisetency between `read.table` and `write.table` is rather a trap! – MatthewS Sep 02 '11 at 13:57

score 5 · Answer 2 · answered Sep 02 '11 at 13:50

5

I also recently had the same error and it disappeared after converting the file to comma or semicolon delimited and read it with read.csv / read.csv2. I know this is not a fullfillig answer but maybe you might check that out.

answered Sep 02 '11 at 13:50

langohrschnauze

450
2
4
8

score -1 · Answer 3 · answered Nov 29 '15 at 06:34

-1

If you want to read as character matrix then first convert your file into .csv format and use read.csv. Don't use any other declaration other than file name. e.g.;

read.csv("filepath")

answered Nov 29 '15 at 06:34

Mithilesh Kumar

256
1
3
18

read.delim() - errors "more columns than column names" and "header and ''col.names" are of different lengths"

3 Answers3

Linked