1

I know there are similar questions out there but I literally have spent all day on google and cannot find the answer to my issue. I have a GMT file where I need to replace the ensembl IDs with gene symbols for running a gene set analysis, I have a dataframe that lists the ensembl IDs with their matching gene symbols. I can run this code for one column and it works:

GMTdf$V3 <- Gene_list$hgnc_symbol[match(GMTdf$V3, Gene_list$ensembl_gene_id)]

But what I CANNOT figure out how to do is loop it for the 495 columns of the GMT file I have. I tried so many things and nothing works. The only thing that looked promising was the following code but it replaces everything with NAs.

GMTdf[,3:495] = Gene_list$hgnc_symbol[GMTdf[,3:495], Gene_list$ensembl_gene_id)]

I have tried using dplyr mutate and advice given in StackOverflow on replacing ensembl IDs with gene symbols but I am too much of an amateur coder to figure it out. Please help.

Cath
  • 23,906
  • 5
  • 52
  • 86

1 Answers1

1

You can use lapply to apply a function for multiple columns.

cols <- 3:495
GMTdf[cols] <- lapply(GMTdf[cols] function(x) 
                      Gene_list$hgnc_symbol[match(x, Gene_list$ensembl_gene_id)])

In dplyr, you can do the same with across.

GMTdf <- GMTdf %>% mutate(across(cols, 
                   ~Gene_list$hgnc_symbol[match(., Gene_list$ensembl_gene_id)]))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213