I have a genetic dataset of chromosome positions, I am trying to use the positions to find gene lengths. For example:
#Input:
Chr Start End Genes
1 1 2 Gene1
1 3 4 Gene1
1 5 9 Gene2
2 1 3 Gene3
#Expected output calculating gene lengths:
Chr Start End Genes Length
1 1 2 Gene1 3
1 3 4 Gene1 3
1 5 9 Gene2 4
2 1 3 Gene3 2
So I am looking to find for each gene the maximum End
value minus the minimum Start
value and put that value in a new Length
column.
I've been trying to go about this with something like:
test <- df %>%
group_by(Genes) %>%
df$Length = (min(df$Start) - max(df$End))
I've also been trying to find a data.table
solution (as my real data is very big) but I am not experienced.