7

This is similar to read.csv row.names and to https://stackoverflow.com/questions/12425599/duplicated-row-names , but I do not see answers that help.

Problem: Trying to read in a file which contains duplicate numbers in first column, but shifts the column headers when row.names=NULL.

I'm trying to read the following file into R

TripId  VID TspVID  VWT VCLS    Week

201110041426    2226    33889   1   0   41

201110041501    2226    33889   1   0   41

201110041510    2226    33889   1   0   41

201110041557    2226    33889   1   0   41

(this is a small excerpt from Excel of a CSV file with many thousands of rows and ~200 columns. There are the same number of entries in the first row as in all others. There are duplicates in the first row. The columns don't line up with the labels in this view, but they do in CSV space.)

The command

> lm.table  <- read.table(file= file.in, sep=",", header=TRUE)
Error in read.table(file = file.in, sep = ",", header = TRUE) : 
  duplicate 'row.names' are not allowed

doesn't work. Using the first column for row.names implies that the first row has less values than the others, which is not the case. I certainly don't want the first column as row.names.

I try to set row.names=NULL

> lm.table  <- read.table(file= file.in, sep=",", header=TRUE, row.names=NULL)

which runs, but the columns have been shifted

> head(lm.table)

     row.names TripId   VID TspVID VWT VCLS       Week     Date TimeStart TimeEnd     Lat1

1 201110010006   2226 33889      1   0   40 2011/09/30 17:06:37  17:25:16 47.5168 -122.209

2 201110010028   2226 33889      1   0   40 2011/09/30 17:28:45  17:43:14 47.5517 -122.058

3 201110010000   2231 45781      1   0   40 2011/09/30 17:00:00  18:02:30 32.9010 -117.193

4 201110011407   2231 45781      1   0   40 2011/10/01 07:07:57  07:48:17 32.7044 -117.004

Note that the new column name "row.names has been introduced and the entire row shifted right.

Here is the tail end of the > head(lm.table) result. It's shifted the column labels onto an undefined column (I think this also shows the number of column labels = number of columns, which is also true from inspection.)

      FVavR FVstdR FIdlR

1  3.959140      2    NA

2  5.285770     20    NA

3  4.274140     26    NA

Any idea why I get the shifting in the columns and how to not shift and have the row.names simply be ascending numbers?

Community
  • 1
  • 1
Chris Wilson
  • 136
  • 1
  • 2
  • 3
  • Any odd characters in your field names? Have you tried `read.csv` instead of `read.table`? That saves specifying the `sep` and `header` and sets `fill=TRUE` (Although you say all of the rows have the same number of entries). – seancarmody Nov 05 '12 at 20:32

5 Answers5

8

had the same problem. just added this line:

colnames(rec) <- c(colnames(rec)[-1],"x")
rec$x <- NULL
adrianoesch
  • 81
  • 1
  • 3
  • 2
    I think you could condense this to one line as `colnames(rec) <- c(colnames(rec)[-1],NULL)` – Cole Jul 13 '16 at 22:59
0

I used the following code:

lm.table  <- read.table("file name", header=TRUE, row.names=NULL)

This added a column to the left with numbered row names, but I didn't find that the column names were shifted. Could it be that your column names still matched the right columns, but the R output made it look like the names had shifted?

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Ryan
  • 121
  • 1
  • 1
  • 11
  • The columns are definitely shifted. As I mentioned below, there is a trailing comma. Why this causes the "row.names" heading to take over an existing data column and shift the others is still a mystery to me. – Chris Wilson Nov 05 '12 at 22:15
0

I was having the same problem. I merged the date with timestamps and now I am able to read from the csv.

You can generate a row number as the first column (say using python) in your csv and then read it again.

ForeverLearner
  • 1,901
  • 2
  • 28
  • 51
0

My problem is associated with field separator for TAB-delimited file:

If I don't specify the field separator:

> condensed <- read.table("condense_report.tab", header=T)
 Error in read.table("condense_report.tab", header = T) : 
 duplicate 'row.names' are not allowed

If I add the sep=TAB

condensed <- read.table("condense_report.tab", header=TRUE, sep="\t")

Then there is no error message.

Here is my file's content (^I is the TAB character, and $ is the end of line mark):

Sample^Imethod^Ispecies^Imean_frac^Istd_frac^Imean_dep^Imean_clsz^Inumrep$ asterix_potion^Imothur^IEnterococcus faecalis^I0.32290000^I0.021755985650701942^I3293.5000^I3309.7500^I4$ asterix_potion^Imothur^IAcinetobacter baumannii^I0.28010000^I0.021539851624928375^I2869.5000^I2880.7500^I4$

The problem is with the species column. It has spaces, and R is using both space and tab as delimiters by default. So you have an extra column than the header according to R if no sep option is given. This is the root of the problem.

Kemin Zhou
  • 6,264
  • 2
  • 48
  • 56
0

If anyone else comes across this problem, my issue was that there were blank spaces in certain columns. After filling these spaces, my issue went away and I was able to load the .csv file just fine.

JKdub
  • 9
  • 2