75

I am trying to load a csv file that has 14 columns like this:

StartDate, var1, var2, var3, ..., var14

when I issue this command:

systems <- read.table("http://getfile.pl?test.csv", header = TRUE, sep = ",")

I get an error message.

duplicate row.names are not allowed

It seems to me that the first column name is causing the issue. When I manually download the file and remove the StartDate name from the file, R successfully reads the file and replaces the first column name with X. Can someone tell me what is going on? The file is a (comma separated) csv file.

zx8754
  • 52,746
  • 12
  • 114
  • 209
george willy
  • 1,693
  • 8
  • 22
  • 26
  • 1
    That link (http://getfile.pl?test.csv) doesn't seem to work. Could you maybe copy and paste the first few lines of the file into the question, or provide a working link? – nograpes Jan 13 '12 at 16:48
  • that ws just an example. the link is internal, you wont be able to get to it. The file format is like this: date, var1, var2, var3, var4, var5 then populated with some data. I can open the file, it is accurate – george willy Jan 13 '12 at 17:04
  • systems <- read.table("http://getfile.pl?test.csv", header=FALSE, sep=","), I seem to get the file but know I have to deal with another row. If I print the contents of systems. this is how it looks: V1 V2 V3 V4 V5 StartDate, Server, uptime, load, memory – george willy Jan 13 '12 at 17:11
  • This question no longer has anything to do with (1) it's title nor (2) the answers below. @minitech just pinging you as the writer of the last edit, made a mere four hours after the question was posted...maybe you know how to salvage this question that has attracted 20k views. – Frank Jan 07 '15 at 19:54
  • 1
    @Frank: Whoops, thanks! Rolled back. – Ry- Jan 07 '15 at 21:04

8 Answers8

114

Then tell read.table not to use row.names:

systems <- read.table("http://getfile.pl?test.csv", 
                      header=TRUE, sep=",", row.names=NULL)

and now your rows will simply be numbered.

Also look at read.csv which is a wrapper for read.table which already sets the sep=',' and header=TRUE arguments so that your call simplifies to

systems <- read.csv("http://getfile.pl?test.csv", row.names=NULL)
Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • did you mean read.csv rather than read.table on your final code chunk? – Tyler Rinker Jan 13 '12 at 17:16
  • 26
    I tried that before and it did not work. The first column now called "row.names" and the column names moved one cell further – george willy Jan 13 '12 at 17:17
  • @Tyler: yup, fixing now. Thanks! – Dirk Eddelbuettel Jan 13 '12 at 17:31
  • @Mike: That is _the point of it_ as it avoids the error you complain about. Either fix your source data, or not it as row.names and _the adjust the data afterwards_. You can easily copy all but the first column to a new data.frame. – Dirk Eddelbuettel Jan 13 '12 at 17:32
  • 4
    "and now your rows will simply be numbered" is not the full story. As your comment indicates you know, column names are also screwed up. If you know of a solution that does not require the awkward workaround mentioned in your comment (shift the column names, copy the data), that would be great. (Unfortunately, "fix your source data" is not sufficient for my case -- which doesn't have the OP's problem, which follows documented behavior http://stackoverflow.com/a/15285380/1191259 . Maybe I will make my case into a reproducible example some day.) – Frank Jan 07 '15 at 19:50
  • 4
    The OP already knew how to avoid the error. In fact, your answer is inferior to what he was doing before, as it generates an extra column, requiring that the data be copied. (I mis-wrote that it only messed up column names.) – Frank Jan 07 '15 at 22:11
  • I have add an situation where this problem may appear and suggested a solution. http://stackoverflow.com/questions/13239639/duplicate-row-names-error-reading-table-row-names-null-shifts-columns/36658971#36658971 – Kemin Zhou Apr 16 '16 at 00:52
44

This related question points out a part of the ?read.table documentation that explains your problem:

If there is a header and the first row contains one fewer field than the number of columns, the first column in the input is used for the row names. Otherwise if row.names is missing, the rows are numbered.

Your header row likely has 1 fewer column than the rest of the file and so read.table assumes that the first column is the row.names (which must all be unique), not a column (which can contain duplicated values). You can fix this by using one of the following two Solutions:

  1. adding a delimiter (ie \t or ,) to the front or end of your header row in the source file, or,
  2. removing any trailing delimiters in your data

The choice will depend on the structure of your data.

test.csv Example:
If your test.csv looks like this:

v1,v2,v3 
a1,a2,a3,
b1,b2,b3,

By default, read.table interprets this file as having one fewer header columns than the data because the delimiters don't match. This is how it is interpreted by default:

   v1,v2,v3   # 3 items!! (header row)
a1,a2,a3,  # 4 items
b1,b2,b3,  # 4 items

The values in the first column (with no header) are interpreted as row.names: a1 and b1. If this column contains duplicate values, which is entirely possible, then you get the duplicate 'row.names' are not allowed error.

If you set row.names = FALSE, the header row shift doesn't happen, but you still have a mismatching number of columns in the header and in the data because the delimiters don't match.

This is how it is interpreted with row.names = FALSE:

v1,v2,v3   # 3 items!! (header row)
a1,a2,a3,  # 4 items
b1,b2,b3,  # 4 items

Solution 1 Add trailing delimiter to header row:

v1,v2,v3,  # 4 items!!
a1,a2,a3,  # 4 items
b1,b2,b3,  # 4 items

Or, add leading delimiter to header row:

,v1,v2,v3  # 4 items!!
a1,a2,a3,  # 4 items
b1,b2,b3,  # 4 items

Solution 2 Remove excess trailing delimiter from non-header rows:

v1,v2,v3   # 3 items
a1,a2,a3   # 3 items!!
b1,b2,b3   # 3 items!!
Brian D
  • 2,570
  • 1
  • 24
  • 43
3

In my case was a comma at the end of every line. By removing that worked

ianaz
  • 2,490
  • 3
  • 28
  • 37
2

I had this error when opening a CSV file and one of the fields had commas embedded in it. The field had quotes around it, and I had cut and paste the read.table with quote="" in it. Once I took quote="" out, the default behavior of read.table took over and killed the problem. So I went from this:

systems <- read.table("http://getfile.pl?test.csv", header=TRUE, sep=",", quote="")

to this:

systems <- read.table("http://getfile.pl?test.csv", header=TRUE, sep=",")
2

I used read_csv from the readr package

In my experience, the parameter row.names=NULL in the read.csv function will lead to a wrong reading of the file if a column name is missing, i.e. every column will be shifted.

read_csv solves this.

Timo Kvamme
  • 2,806
  • 1
  • 19
  • 24
  • to add another possible hint: check that the header contains the same number of columns as your data – Colin D Apr 12 '22 at 13:47
1

Another possible reason for this error is that you have entire rows duplicated. If that is the case, the problem is solved by removing the duplicate rows.

abruin
  • 81
  • 2
0

It seems the problem can arise from more than one reasons. Following two steps worked when I was having same error.

  1. I saved my file as MS-DOS csv. ( Earlier it was saved in as just csv , excel starter 2010 ). Opened the csv in notepad++. No coma was inconsistent (consistency as described above @Brian).
  2. Noticed I was not using argument sep="," . I used and it worked ( even though that is default argument!)
zx8754
  • 52,746
  • 12
  • 114
  • 209
qqqqq
  • 837
  • 12
  • 31
0

The answer here (https://stackoverflow.com/a/22408965/2236315) by @adrianoesch should help (e.g., solves "If you know of a solution that does not require the awkward workaround mentioned in your comment (shift the column names, copy the data), that would be great." and "...requiring that the data be copied" proposed by @Frank).

Note that if you open in some text editor, you should see that the number of header fields less than number of columns below the header row. In my case, the data set had a "," missing at the end of the last header field.

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
ximiki
  • 435
  • 6
  • 17