I want to find out if certain genes cluster together. Now, i already have a list of genes, and also their start and stop locations, and i already know how to calculate the distance between these genes. the problem is that i don't know how to take into account the switch in chromosomes.
You can't measure the distance between a gene on chromosome 1, and a gene on chromosome 2.
I thought of calculating the distance like this: start location of gene 2 - stop location of gene 1. Then, you have the distance between these genes.
But how do i account for this: when you reach the next chromosome, the R code grabs the start location of a gene on chromosome 2, but the stop location of a gene on chromosome 1, and this is not possible (for my research, at least).
So i am wondering how to account for that in R. I just need to somehow skip the genes if they are on different chromosomes.
I hope you guys can help me.
about the code below: the three vectors are just the vectors of start en stop locations, and the chromosomes. they are all of equal length. chromosomes is a vector containing the chromosome number for every gene
start_vector <- as.vector(sorted_coords$start_position)
end_vector <- as.vector(sorted_coords$end_position)
chromosomes <- as.vector(sorted_coords$chromosome_name)
chromosomes[is.na(chromosomes)] <- 24
count = 0
for(i in 1:length(chromosomes)){
if(count != chromosomes[i]){
start <- i - 1
end <- i + 1
start_vector <- start_vector[-start]
end_vector <- end_vector[-end]
count <- count + 1
}
}
I expect a vector of distances of all the genes, excluding the distances of genes which lie on different chromosomes.