0

I have a two data frames (data.1 and data.2) with three columns that are present in both (CITY, COUNTY and STATE).

What I want is to filter the first data frame based on values in three columns present in the second data frame (that may have extra, so I need intersection). Although I can produce a working code as here: select rows in one data frame that partially match rows in another data frame

I'd like to have a more generic solution, where names of columns are passed, for example. I know a solution using left_join in dplyr, but I don't need the extra columns, I just want to filter.

data.1 <- read.table(text= "
     CITY     COUNTY   STATE        AA
       1          1         1        2
       2          1         1        4
       1          2         1        6
       2          2         1        8
       1          1         2       20
       2          1         2       40
       1          2         2       60
       2          2         2       80
       1          1         3      200
       2          1         3      400
       1          2         3      600
       2          2         3      800
       1          1         4     2000
       2          1         4     4000
       1          2         4     6000
       2          2         4     8000
       1          1         5    20000
       2          1         5    40000
       1          2         5    60000
       2          2         5    80000
", header=TRUE, na.strings=NA)

data.2 <- read.table(text= "
     CITY     COUNTY     STATE      BB
       1          1         2      -10
       2          1         2      -11
       1          2         2      -12
       2          2         2      -13
       1          1         4      -14
       2          1         4      -15
       1          2         4      -16
       2          2         4      -17
       3          3         4      -18
", header=TRUE, na.strings=NA)

desired.result <- read.table(text= "
     CITY     COUNTY    STATE       AA
       1          1         2       20
       2          1         2       40
       1          2         2       60
       2          2         2       80
       1          1         4     2000
       2          1         4     4000
       1          2         4     6000
       2          2         4     8000
", header=TRUE, na.strings=NA)

How can I achieve this?

Community
  • 1
  • 1
JelenaČuklina
  • 3,574
  • 2
  • 22
  • 35
  • 1
    Use `merge` (or any other `join` method from `dplyr`) and remove extra columns afterwise. For instance `merge(data.1,data.2,all=FALSE)[-5]`. – nicola Jan 21 '16 at 09:02
  • Do I understand correctly that I can't answer my own question? The function I was looking for is called `semi_join` from `dplyr` package. I would think of keeping the question for the sake of those new to table joins that learn terminology with the hep of SO. :) – JelenaČuklina Jan 21 '16 at 10:03

0 Answers0