I have a table with pairs (Lnc/gene) and their distance but I need to do filtration to get for each Lnc the closest gene
example
Genex Lnc1 1KB GeneY Lnc4 20KB
Thank you in advance
I have a table with pairs (Lnc/gene) and their distance but I need to do filtration to get for each Lnc the closest gene
example
Genex Lnc1 1KB GeneY Lnc4 20KB
Thank you in advance
Below is one possible dplyr
solution. Please try to make your questions reproducible by sharing a minimal dataset/code.
# importing the necessary package
library(dplyr)
# reproducing your data
df <- data_frame(
Gene = c("Gene X", "Gene X", "Gene X", "Gene Y"),
Lnc = c("Lnc1", "Lnc2", "Lnc3", "Lnc4"),
`Distance (KB)` = c(1, 300, 200, 20)
)
# grouping by Gene and choosing the minimum Gene-Lnc distance
df %>%
group_by(Gene) %>%
filter(`Distance (KB)` == min(`Distance (KB)`))
# # A tibble: 2 x 3
# # Groups: Gene [2]
# Gene Lnc `Distance (KB)`
# <chr> <chr> <dbl>
# 1 Gene X Lnc1 1
# 2 Gene Y Lnc4 20
in case if only one pair of Lnc, Gene, with the closest distance, then you can use also below
df%>%
group_by(Gene)%>%
arrange(`Distance (KB)`)%>%
summarise(Lnc=first(Lnc), Dist=first(`Distance (KB)`))