-1

I tried "unique" and "duplicated" but cannot get R to do what I want, which is basically compare two sets of data and find out who one the first data set is not on the second data set. data1 contains a customer ID, name and the year that person bought X. data2 contains a customer ID and year (2017) indicating they purchased X this year. What I want to do is extract a list of people from data1 who have NOT purchase X this year...so I can contact them and tell them to buy X again.

> data1 
ID  NAME YEAR 
8   Ann 2016 
10  Bill 2014
11  Doug 2016
12  Emma 2015
5   Fred 2014
9   Julie 2014
13  Karl 2016
15  Matt 2014
14  Rhett 2014
7   Sara 2015
4   Tom 2014

> data2
ID YEAR
29 2017
32 2017
10 2017
21 2017
11 2017
5  2017
28 2017
33 2017
24 2017
22 2017
31 2017
15 2017
25 2017
30 2017
26 2017
7  2017
23 2017
27 2017

Merging data1 and data2 by ID ( merge(data1,dat2, by"ID") ) gives me:

> merged_d1d2
  ID NAME YEAR.x YEAR.y
1  5 Fred   2014   2017
2  7 Sara   2015   2017
3 10 Bill   2014   2017
4 11 Doug   2016   2017
5 15 Matt   2014   2017

...But I want everyone EXCEPT these people! I also added the names into data2 and then combined data1 and data2 using rbind which gives me a data set with duplicates (e.g. 2 Fred, 2 Sara, 2 Bill, etc.) I then tried to use "unique" and "duplicated" but these always leave one of those duplicates (1 Fred, 1 Sara) in the new data. I want everyone from data1 except those people. I have a feeling this is a simple process, but any help would be greatly appreciated.

Pierre Lapointe
  • 16,017
  • 2
  • 43
  • 56
Matt B
  • 1
  • Crap...sorry those data sets didn't come out the way I wanted. – Matt B Apr 13 '17 at 17:15
  • Well, I just figured it out. I used dplyr ::left_join(data1, data2, by "YEAR". Exported to Exel, sorted by "YEAR.y" and can just delete those people I don't want. Sorry for wasting your time! – Matt B Apr 13 '17 at 17:32
  • Possible duplicate of [Delete rows that exist in another data frame?](http://stackoverflow.com/questions/17338411/delete-rows-that-exist-in-another-data-frame) – Ronak Shah Apr 13 '17 at 18:27

1 Answers1

2

Simply:

data1[!data1$ID%in%data2$ID,]

   ID  NAME YEAR
1   8   Ann 2016
4  12  Emma 2015
6   9 Julie 2014
7  13  Karl 2016
9  14 Rhett 2014
11  4   Tom 2014

Or you could try anti_join by ID from dplyr:

data1 <- read.table(text="ID  NAME YEAR
8   Ann 2016
10  Bill 2014
11  Doug 2016
12  Emma 2015
5   Fred 2014
9   Julie 2014
13  Karl 2016
15  Matt 2014
14  Rhett 2014
7   Sara 2015
4   Tom 2014",header=TRUE, stringsAsFactors=FALSE)

data2 <- read.table(text="ID YEAR
29 2017
32 2017
10 2017
21 2017
11 2017
5  2017
28 2017
33 2017
24 2017
22 2017
31 2017
15 2017
25 2017
30 2017
26 2017
7  2017
23 2017
27 2017",header=TRUE, stringsAsFactors=FALSE)

library(dplyr)
anti_join(data1,data2,by="ID")

  ID  NAME YEAR
1  4   Tom 2014
2  8   Ann 2016
3  9 Julie 2014
4 12  Emma 2015
5 13  Karl 2016
6 14 Rhett 2014
Pierre Lapointe
  • 16,017
  • 2
  • 43
  • 56