0

I know this has been asked several times but I've tried a lot of solutions as suggested before and they don't work. I keep getting the below error on R (no matter however I modify the csv file) when I run the below

annotation_file <- "Best3_Abicinctus_FunctionalAnnotation.csv"
annotation_info <- read.csv(annotation_file, row.names=1, header=T)
Error in read.table(file=file,header=header,sep=sep,quote=quote, : duplicate 'row.names' are not allowed

I cannot set 'row.names=NULL' as this will screw up the data order for what I intend to do downstream. I even removed blanks/tabs from the end of every row by using sed 's/[[:blank:]]*$//'but the error doesnt go away. I tested replacing commas and spaces in all of the column entries and yet the annoying error doesn't go away. This is how first few lines of the file look like

"gene_id","name","product"
"maker-Contig673-pred_gff_AUGUSTUS-gene-1.6","stk10","Serine/threonine-protein kinase 10"
"maker-Contig204-pred_gff_AUGUSTUS-gene-3.1","ccnh","Cyclin-H"
"maker-Contig31958-pred_gff_AUGUSTUS-gene-0.7","fam136a","Protein FAM136A"
"maker-Contig31340-pred_gff_AUGUSTUS-gene-0.8","h2b","Histone H2B"

The file is available here on Dropbox in case you would like to take a look. I'm on a deadline and I'm just helplessly stuck at this step. Any help would be highly appreciated.

  • 1
    By using `row.names=1` you are telling `read.csv` to use the `gene_id`-column as rownames. Looking at the first few lines, I can imagine that not all names are unique. Do you really want the `gene_id` to become rownames instead of an actual column in your dataset? – Jaap Sep 01 '21 at 08:01
  • Yes cos' in the downstream steps, I run the following: sig_de_annotations <- annotation_info[rownames(sig_de_results),] sig_de_results<-cbind(sig_de_annotations, as.data.frame(sig_de_results)) write.csv(sig_de_results, row.names=T, file="DEGlist_Deformed_vs_Healthy.csv",) – Rodriguez J Mathew Sep 01 '21 at 08:06
  • The sig_de_results contain the gene_id in its first column i.e., "maker-Contigxxx" – Rodriguez J Mathew Sep 01 '21 at 08:12
  • 2
    I think you will have to update your script because dealing with multiple rownames is not good practice. I suggest you keep `row.names = NULL` and adapt `sig_de_annotations <- annotation_info[which(annotation_info$gene_id %in% rownames(sig_de_results)),]` or something similar – Basti Sep 01 '21 at 08:29
  • Thanks @BastienDucreux for the updated script but the script seems to be screwing up the gene order in the final csv file – Rodriguez J Mathew Sep 01 '21 at 09:16
  • Could you add some details about "downstream steps" you want to perform? It is usally a lot easier to store your data in a column than trying to deal with rownames problems. For the gene order you could use functions such as `dplyr::arrange` to sort your dataframe – Paul Sep 01 '21 at 09:24
  • Thanks for your response @Paul. I have a list of significantly up/down-regulated gene_ids after performing DESeq2 along with other columns such as p-value and so on stored in sig_de_results and I just want to annotate those gene_ids (e.g.,""maker-Contigxxx") in sig_de_results to the gene name info present on "Best3_Abicinctus_FunctionalAnnotation.csv". – Rodriguez J Mathew Sep 01 '21 at 09:37
  • and this is the R code I use to achieve this annotation `sig_de_annotations <- annotation_info[rownames(sig_de_results),] sig_de_results<-cbind(sig_de_annotations, as.data.frame(sig_de_results)) write.csv(sig_de_results, row.names=T, file="DEGlist_Deformed_vs_Healthy.csv",)` – Rodriguez J Mathew Sep 01 '21 at 09:37
  • Thanks but that does not make your example reproducible. Please edit your question with a minimal [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), it will help us a lot to figure out how to help you. I.e. currently, we do not have enough to make the `sig_annotations` and `sig_results` objects – Paul Sep 01 '21 at 09:43
  • 1
    `gene_id` definitely should be a column and not rowname. How does making it a column screw up the order? – Ronak Shah Sep 01 '21 at 11:18

1 Answers1

0

I fixed the issue by removing the duplicate gene ids in column 1 of the annotation file "Best3_Abicinctus_FunctionalAnnotation.csv". Earlier I had tested by only removing the duplicate rows. Thank you @Jaap for the hint and also to the other users for their various suggestions.