1

I want to filter a datatable(d1) for all rows sharing the same id as another datatable(d2).

d1 <- data.table(id = c('123','132','142', '153', '143'),
                 x1 = c('a','a','b','e','r'),
                 x2 = c('d','e','f','z','o'))
    
d2 <- data.table(id = c('123','132','142', '143', '165', '153', '143'),
                   y1= c('ABCDE', 'CDEFE', 'EFSAD', 'FEASD', 'SDSDS', 'VDCDS', 'DSXSZ' )

Expected output:

   id    y1
1: 123  ABCDE
2: 132  CDEFE
3: 142  EFSAD
4: 153  VDCDS
5: 143  DSXSZ

I have tried the simple code:

d2[id %in% d1$id]

But R gets stuck when running the code and the stop sign to stop the code does not show up and ESC when in the console does not work. I need to restart R to stop the code.

I know how to do this in dplyr

d2 %>%
 group_by(id) %>%
  filter(lopnr %in% d1$lopnr)

but i need to do it in data.table since I am dealing with large dataframes and it takes too long time in dplyr.

camille
  • 16,432
  • 18
  • 38
  • 60
ccalle
  • 53
  • 5
  • 2
    (1) Your `data.table(..)` definition for `d2` is missing a close-paren, that could be causing a problem (or is just a red herring in the question). (2) You have two `d2` ids of 143, but only one of them remains. Why? – r2evans Feb 02 '22 at 23:54
  • 2
    I expect you'll get much better performance from `d2 %>% right_join(d1 %>% distinct(id))` or equivalently `d1 %>% distinct(id) %>% left_join(d2)`. This should have the effect of only including the rows of d2 whose ids appear in d1. – Jon Spring Feb 02 '22 at 23:55
  • 1
    What you're describing is a semi-join – camille Feb 03 '22 at 00:18
  • Applicable for the "merge/join" discussions: https://stackoverflow.com/q/1299871/3358272, https://stackoverflow.com/q/5706437/3358272 – r2evans Feb 03 '22 at 01:06

2 Answers2

1
newdf <- merge(d2,d1, all = FALSE)

This will still bring in the other columns but you can simply drop these after

mgd6
  • 168
  • 8
1

You can use this approach:

library(data.table)

d2[d1, on = .(id), nomatch = NULL][, c("id", "y1")]

Output

    id    y1
1: 123 ABCDE
2: 132 CDEFE
3: 142 EFSAD
4: 153 VDCDS
5: 143 FEASD
6: 143 DSXSZ
Jose
  • 421
  • 3
  • 10