Lets say we have to data.frames:
x <- data.frame(date=c(1,2,3,1,3), id=c("a", "a", "a", "b", "b"), sum=50:54)
y <- data.frame(date=c(1,2,1,3), id=c("a", "a", "b", "b"))
x
date id sum
1 1 a 50
2 2 a 51
3 3 a 52
4 1 b 53
5 3 b 54
y
date id
1 1 a
2 2 a
3 1 b
4 3 b
Now, i want to find the row in x that has dates that is not in y, within the same id. In y we have 1, 2 and 3 in id a and in y we only have 1 and 2 in id a. How do i identify (and, preferably remove from x) row number 3 in x?
EDIT: I found a (very ugly and slow) solution, but there has to be a better and faster one? Currently im running it on two large data.frames, and first time it took more than one hour. I need to run it multiple times, so any help would be appreciated.
z <- data.frame()
for (f in 1:length(unique(x$id))) { #Run the iteration for all the unique id's in x
id <- unique(x$id)[f] #find the name of the id in this iteriation
a <- x[x$id==id,] #subset x
b <- y[y$id==id,] #subset y
x.new <- a[a$date%in%unique(b$date),] #find the dates that are in x and also in y
z <- rbind(z, x.new) #bind the two data.frames together
}