0

I have two data frames. I need to delete ids that are not matching to each other. For example, I need to deleted id 4, 7, 10's data in a and id 5, 12, 15 in b. I need to delete data from both data frames.

a:

    id   name
1    1    A
2    2    B
3    4    D
4    7    G
5    8    H
6    9    I
7    10   J
8    13   M

b:

    rating  id   name
1     3      1    A
2     4      2    B
3     4      5    E
4     5      8    H
5     2      9    I
6     1      12   L
7     3      13   M
8     4      15   O

Update:

c:
        rating  id   name
    1     4      2    B
    2     2      9    I
    3     3      13   M

Another question, if all C's data matching B, and I want to delete C's data in B, so B will only have id 1, 5, 8, 12, 15.

Makiyo
  • 441
  • 5
  • 23
  • 2
    `subset(a, id %in% intersect(a$id, b$id))` and similar for `b` – Cath Sep 29 '17 at 09:36
  • If you are updating the datasets, it become difficult to give a solution – akrun Sep 29 '17 at 09:42
  • @akrun I actually asked the wrong question, sorry! I just updated the new question. – Makiyo Sep 29 '17 at 09:55
  • The one you updated would be `anti_join(b, c['id'])` – akrun Sep 29 '17 at 10:01
  • If you want to remove only those rows of `b` where *all* of `c`'s data match, anti-joining on `id` alone is not sufficient. With `data.table` this becomes: `library(data.table); b[!c, on = .(rating, id, name)]` – Uwe Sep 29 '17 at 10:07

2 Answers2

1

We need an inner_join

library(dplyr)
inner_join(a, b['id'], by = 'id')
akrun
  • 874,273
  • 37
  • 540
  • 662
0
df1 <- read.table(text = "id   name
1    1    A
2    2    B
3    4    D
4    7    G
5    8    H
6    9    I
7    10   J
8    13   M", header = T)

df2 <- read.table(text = "id   name
1    1    A
2    2    B
3    5    E
4    8    H
5    9    I
6    12   L
7    13   M
8    15   O", header = T)

ids <- merge(x = df1, y = df2, by = "id", all = F )$id

df1 <- df1[df1$id %in% ids,]
df2 <- df2[df2$id %in% ids,]
Hardik Gupta
  • 4,700
  • 9
  • 41
  • 83