-2

I am looking for an R solution (or a general logic solution) to convert Homo sapiens gene names into Danio rerio gene names. My current coding skills are fairly primitive, so I tried writing something with for-loops and if-statements, but it can only pick up one of the ortholog genes, however there are multiple. For example, for the human gene REG3G there are three zebrafish ortholog genes: si:ch211-125e6.13, zgc:172053, lectin. I have added the code I wrote, but that only picks up the last one, but I would like it to output all three.

I have also been having trouble finding R/BiomaRt code to help complete this task and would love any advice.

# Read excel file containing list of zebrafish genes and their human orthologs.
ortho_genes <- read_excel("/Users/talha/Desktop/Ortho_Gene_List.xlsx")

# Separate data from excel file into lists.
zebrafish <- ortho_genes$`Zebra Gene Name`
human <- ortho_genes$`Human Gene Name`

# Read sample list of differential expressed genes
sample_list <- c("GREB1L","SIN3B","NCAPG2","FAM50A","PSMD12","BPTF","SLF2","SMC5", "SMC6", "TMEM260","SSBP1","TCF12", "ANLN", "TFAM", "DDX3X","REG3G")

# Make a matrix with same number of columns as genes in the supplied list.
final_m <- matrix(nrow=length(sample_list),ncol=2)

# Iterate through every gene in the supplied list
for(x in 1:length(sample_list)){
  
  # Iterate through every human gene
  for(y in 1:length(human)){
    
    # If the gene from the supplied list matches a human gene
    if(sample_list[x] == human[y]){
      
      # Fill our matrix in with the supplied gene and the zebrafish ortholog
      # that matches up with the cell of the human gene
      final_m[x,1] = sample_list[x]
      final_m[x,2] = zebrafish[y]
    }
  }
}
  • 2
    Please include the first few rows of `zebrafish` and `human`. Is there a 1:1 relationship between the ortho_genes lists? Rather than a nested loop, this might be suited for a `join()` or `merge()` – M.Viking Nov 15 '22 at 19:55
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Include data in the question itself rather than external files. – MrFlick Nov 15 '22 at 20:17
  • Greetings! Usually it is helpful to provide a minimally reproducible dataset for questions here so people can troubleshoot your problems. One way of doing this is by using the `dput` function. You can find out how to use it here: https://youtu.be/3EID3P1oisg – Shawn Hemelstrand Nov 16 '22 at 05:33

1 Answers1

0

You didn't specify the structure of ortho_genes. This is my guess:

ortho_genes <- tibble::tibble(Zebra = c("greb1l", "sin3b", "ncapg2", "fam50a", "psmd12", "bptf", "fam178a"),
                              Human = c("GREB1L","SIN3B","NCAPG2","FAM50A","PSMD12","BPTF","SLF2"))

You can simply index the table with sample_list (it's a vector, not a list)

sample_list <- c("NCAPG2", "SLF2", "GREB1L")
ortho_genes[ortho_genes$Human %in% sample_list,]

You also didn't specify how you want the output. Do you need a matrix? If you want to write the result into a file, a matrix may not be optimal.

Cloudberry
  • 240
  • 2
  • 8