1

here my problem in R:

mtable <- read.table(paste(".folder_1362704682.4574","/groups.txt",sep=""),sep="\t",comment.char='',skip=0, header=TRUE, fill=TRUE,check.names=FALSE)

The first folder part or paste() is normally wrapped by a var, for debugging purpose -> static.

I always get the message:

Error in read.table(paste(".frunc_1362704682.4574", "/groups.txt", sep = ""),  :
  duplicate 'row.names' are not allowed

But if i look to the file with this header:

root_node_name  node_name       node_id #genes_in_root_node     #genes_in_node  #genes_with_variable=1_in_root_node     #genes_with_variable=1_in_node  raw_p_underrepresentation_of_variable=1 raw_p_overrepresentation_      of_variable=1  FWER_underrepresentation        FWER_overrepresentation FDR_underrepresentation FDR_overrepresentation

I can not see any duplicates.. :( I've read in another discussion about that i should try :

mtable <- read.table(paste(".frunc_1362704682.4574","/groups.txt",sep=""),sep="\t",comment.char='',skip=0, header=TRUE, fill=TRUE,check.names=FALSE,**row.names=NULL**)

That works nice, but after that all headings are shifted one column to the right:

> head(mtable, n=1)
           row.names                            root_node_name  node_name
1 molecular_function trans-hexaprenyltranstransferase activity GO:0000010
  node_id #genes_in_root_node #genes_in_node
1   17668                   2           2419
  #genes_with_variable=1_in_root_node #genes_with_variable=1_in_node
1                                   0                        0.74491
  raw_p_underrepresentation_of_variable=1
1                                       1
  raw_p_overrepresentation_of_variable=1 FWER_underrepresentation
1                                      1                        1
  FWER_overrepresentation FDR_underrepresentation FDR_overrepresentation
1        

Any ideas to get it right? :(

EDIT:

Okay as a comenteer said, this is mainly a problem with thr rows.. stupid as iam i thought it ight come from the header. but i dont wanna name the rows, it just should read them easy in... o.O cant be that hard , or?

File-content:

molecular_function      trans-hexaprenyltranstransferase activity       GO:0000010      17668   2       2419    0       0.74491 1       1       1       -1      -1
molecular_function      single-stranded DNA specific endodeoxyribonuclease activity     GO:0000014      17668   5       2419    0       0.478885        1       1       1       -1      -1
molecular_function      lactase activity        GO:0000016      17668   1       2419    0       0.863086        1       1       1       -1      -1
molecular_function      alpha-1,3-mannosyltransferase activity  GO:0000033      17668   3       2419    0       0.64291 1       1       1       -1      -1
molecular_function      tRNA binding    GO:0000049      17668   27      2419    7       0.975698        0.0663832       1       1       -1      -1
molecular_function      fatty-acyl-CoA binding  GO:0000062      17668   20      2419    6       0.986407        0.0460431       1       1       -1      -1
molecular_function      L-ornithine transmembrane transporter activity  GO:0000064      17668   1       2419    0       0.863086        1       1       1       -1      -1
molecular_function      S-adenosylmethionine transmembrane transporter activity GO:0000095      17668   1       2419    0       0.863086        1       1       1       -1      -1
Smoki
  • 551
  • 2
  • 9
  • 28
  • 1
    Just to be clear, the header of a file is not what would cause duplicate `row.names` (header = `col.names`) – Señor O Mar 08 '13 at 01:32
  • oh okay,.. o.O how can i then force to import? Yeah it might could be that a hole row of my data is the same. I attached a example of the file above. So how to force? – Smoki Mar 08 '13 at 01:34
  • 1
    duplicate row names, not duplicate rows – Señor O Mar 08 '13 at 01:36
  • mhh okay o.O? but i didn't asign a row name?! o.O mhh im not very familiar with R i just want to import that data from that text sheet :D ,... i didn't need a 'row-name' its just like a simple table... ?-) – Smoki Mar 08 '13 at 01:39
  • I saw a wuite different problem. Writing an output of lm() to a file with dump(), – Roger Sep 16 '22 at 21:45

4 Answers4

11

According to the R documentation here,

If there is a header and the first row contains one fewer field 
than the number of columns, the first column in the input is used
for the row names. Otherwise if row.names is missing, the rows are numbered. 

... therefore I'd suggest that the first row may have one fewer field than the number of columns, so read.table() is selecting the first column (which contains more than one copy of molecular_function) as the row names.

Simon
  • 10,679
  • 1
  • 30
  • 44
  • uhh i try'd : head -n 10 groups.txt | cat --show-tabs and it seems that some guy that wrote that original program which generates the files, added an \t at the end of the line. it looks like : – Smoki Mar 08 '13 at 01:59
  • 3
    `read.table()` is a very smart and convenient function but I think sometimes it is just a little too smart for its own good (or, perhaps, for our good) when the file format isn't quite what we expect ... – Simon Mar 08 '13 at 02:28
1

The answer here (https://stackoverflow.com/a/22408965/2236315) by @adrianoesch should help.

Note that if you open in some text editor, you should see that the number of header fields less than number of columns below the header row. In my case, the data set had a "," missing at the end of the last header field.

Community
  • 1
  • 1
ximiki
  • 435
  • 6
  • 17
0

I ran into the same problem and the issue was a tonne of tabular white-space at the bottom of my text file. Thus every row name was the same on these lines (ie was blank). Thus occurred because I converted from excel.

amrezans
  • 33
  • 1
  • 6
0

I have automatically generated data files that wind up with one column empty other than the header. I don't want to have to edit each file separately (and risk fouling it up). Best work-around I found was in question #4066607, to include "row.names=NULL" in the arguments.

DF<-read.csv(file, ..... , row.names=NULL)

This isn't perfect, but lets me load the file. Unlike the behavior described in the other answer (forcing addition of a extra column of row numbers), I get the original first column labeled "row.names" and all the headers shifted one column to the right.... but it lets me get all the data in.

KJG
  • 23
  • 5