Hello I have a dataframe such as
COL1 start end Category
A 30 70 Cat1
A 10 20 Cat2
A 90 300 Cat2
A 12 26 Cat2
A 72 145 Cat2
B 71 145 Cat2
B 250 350 Cut3
B 355 600 Cat2
So here I'm looking for a code to count the number of df$Category=="Cat1"
that have flanking df$Category=="Cat2"
values, and this flanking regions must be < 5
So let's take and exemple, for each df$COL1
and each df$Category
I count the number of flanking Cat2
:
Here
COL1 start end Category
A 30 70 Cat1
so I'm looking for Cat2 with a start !< 25
and end !> 75
, when I look into the df I see that there are :
A 10 20 Cat2 <- this one is too faraway (-10) from 30
A 90 300 Cat2 <- this one is too faraway (+30) from 70
A 72 145 Cat2 <- this one is ok since 72 is just +2 faraway from 70
A 12 26 Cat2 <- this one is ok since 26 is just -4 faraway from 30
So I add a count into a table such as :
New_df
COL1 Nb_flanking
A 2
Then I do the same for df$COL1 ==B
:
COL1 start end Category
B 250 350 Cut3
I'm looking for Cat2 with a start !< 245
and end !> 355
, when I look into the df I see that there are :
B 71 145 Cat2 <- this one is too faraway (-105) from 250
B 355 600 Cat2 <- this one is ok since 345 is just +5 faraway from 350
Then I fill the New_df
COL1 Nb_flanking
A 2
B 1
and so on and so on...
Here are the data
structure(list(COL1 = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L), .Label = c("A", "B"), class = "factor"), start = c(30L,
10L, 90L, 12L, 72L, 71L, 250L, 355L), end = c(70L, 20L, 300L,
26L, 145L, 145L, 350L, 600L), Category = structure(c(1L, 2L,
2L, 2L, 2L, 2L, 3L, 2L), .Label = c("Cat1", "Cat2", "Cut3"), class = "factor")), class = "data.frame", row.names = c(NA,
-8L))
Thank you very much for your help and time.