I have a data frame where there are some duplicated x values. Those values are associated with a range. The values that are duplicated (duplicated x values) and that have an overlapping range (both lines are overlapping in low and up column) should be deleted, but I want to keep the values that they refer to (the val colum).
library(dplyr)
df = data.frame(x=c("A","A","A","B","B","B","C"),
low = c(-10,-5,100,100,200,300,10),
up = c(2,3,200,150,250,350,20),
val = c(1,2,150,125,225,325,15))
df
x low up val
1 A -10 2 1
2 A -5 3 2
3 A 100 200 150
4 B 100 150 125
5 B 200 250 225
6 B 300 350 325
7 C 10 20 15
As you can see, line 1 and 2 are overlapping. Looking at the example here. I tried to write a dplyr code, but it's not giving me the expected result. The difference with the example is that I'm not "merging" 2 datasets. So I don't know how to test different ranges within a dataset that contains duplicated values. I was thinking to group the duplicated values in the data set and test them individually. But since it's not integrated in the dplyr function, I think it's not working.
df.gr = df %>%
group_by(x)
df.gr[with(df.gr, low <= up),]
In the end, I want something like this
x low up val
1 A -10 3 1,2
2 A 100 200 150
3 B 100 150 125
4 B 200 250 225
5 B 300 350 325
6 C 10 20 15
As you can see, for the range that are overlapping, I'm keeping the lowest value, and the highest of the 2 ranges. In addition, I want to "remember" the values in the column "val". That's why I want to end up with 1,2 in the range that is overlapping.
Also, in the example I'm citing here, there was only 2 ranges for each duplicated values. In my case I could have 2 duplicated values or more. I want to test any duplicated value and test their range and see if they overlap.
For example,
df = data.frame(x=c("A","A","A","A","B","B","B","C"),
low = c(-10,-5,-2,100,100,200,300,10),
up = c(2,3,4,200,150,250,350,20),
val = c(1,2,3,150,125,225,325,15))
df
x low up val
1 A -10 2 1
2 A -5 3 2
3 A -2 4 3
4 A 100 200 150
5 B 100 150 125
6 B 200 250 225
7 B 300 350 325
8 C 10 20 15
Would give:
x low up val
1 A -10 4 1,2,3
2 A 100 200 150
3 B 100 150 125
4 B 200 250 225
5 B 300 350 325
6 C 10 20 15
I've also tried to use the spread
function in tidy without luck:
tidyr::spread(df,x,val)
low up A B C
1 -10 2 1 NA NA
2 -5 3 2 NA NA
3 -2 4 3 NA NA
4 10 20 NA NA 15
5 100 150 NA 125 NA
6 100 200 150 NA NA
7 200 250 NA 225 NA
8 300 350 NA 325 NA
Basically, if I want to use this I would have to put as key both the low and up columns, but I can't do that. Also, the number of overlap for each duplicated values would make the size of the datagram variable when it's creating the different columns. So I don't know how to proceed with this...