0

Let's say I have one data.frame with monthly sales tickets, with an ID client, the month and the amount of money.

head(tickets)
  id_client      month sales
1   ID87160 2016-01-01 16875
2   ID18694 2016-01-01   448
3   ID20624 2016-01-01 16311
4  ID171683 2016-01-01   314
5  ID214926 2016-01-01  8889
6   ID82071 2016-01-01  7479

I have another data.frame where I have the moment when the clients canceled their subscription.

head(stop_being_client)
  id_client       date
1  ID235005 2016-03-01
2   ID50615 2016-04-01
3   ID72078 2016-03-01
4  ID129556 2016-01-01
5  ID204060 2016-04-01
6   ID57769 2016-01-01

Now I need to check that in the tickets table don't exist any register of a client no-subscribed, i.e. with a month in tickets bigger than the date in stop_being_client.

In PostgreSQL would be easy:

SELECT
    *
FROM
    tickets
JOIN
    stop_being_client
ON
    tickets.id_client = stop_being_client.id_client
WHERE
    tickets.month > stop_being_client.date;

But I have no idea how to do it in R. I tried with this

tickets[which(
    tickets$id_client %in% stop_being_client$id_client &
    tickets$month > stop_being_client$date
    ),]

But I'm pretty sure that the result is not what I want, because somehow I need to relate the id_client from both tables when comparing the dates.

EDIT: I put an example:

This is the tickets data.frame:

id_client      month sales
      ID2 2016-01-01 12698
      ID1 2016-01-01  8626
      ID2 2016-02-01 18309
      ID1 2016-02-01 15653
      ID3 2016-02-01  9642
      ID3 2016-03-01 18376
      ID1 2016-03-01 13440
      ID2 2016-03-01  2322
      ID1 2016-04-01 19010
      ID3 2016-04-01  7129
      ID2 2016-04-01 14694
      ID2 2016-05-01  4726
      ID1 2016-05-01   706
      ID3 2016-05-01 16995
      ID1 2016-06-01 18743
      ID3 2016-06-01 16725
      ID2 2016-07-01  2632

This is the table stop_being_client:

id_client       date
      ID1 2016-03-01
      ID2 2016-04-01

So I want to detect those rows in tickets, that shouldn't exists, in that case:

id_client      month sales
      ID1 2016-04-01 19010
      ID2 2016-05-01  4726
      ID1 2016-05-01   706
      ID1 2016-06-01 18743
      ID2 2016-07-01  2632
Sotos
  • 51,121
  • 6
  • 32
  • 66
  • Can you make a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and what the output would look like? – Roman Luštrik Apr 25 '17 at 09:15

2 Answers2

1

Here is an idea via base R,

l4 <- split(df, df$id_client)
do.call(rbind, lapply(Map(cbind, l4, temp = ind1), function(i){
                                     i <- i[i$month > i$temp[!is.na(i$temp)],]; 
                                     i$temp <- NULL; i
                                     }))


#       id_client      month sales
#ID1.9        ID1 2016-04-01 19010
#ID1.13       ID1 2016-05-01   706
#ID1.15       ID1 2016-06-01 18743
#ID2.12       ID2 2016-05-01  4726
#ID2.17       ID2 2016-07-01  2632
Sotos
  • 51,121
  • 6
  • 32
  • 66
0

With data.table:

library(data.table)
setDT(tickets)
setDT(stop_being_client)

stop_being_client[tickets, on = .(date < month, id_client==id_client),nomatch=0,.(id_client,month,date,sales)]

id_client      month       date sales
1:       ID1 2016-04-01 2016-04-01 19010
2:       ID2 2016-05-01 2016-05-01  4726
3:       ID1 2016-05-01 2016-05-01   706
4:       ID1 2016-06-01 2016-06-01 18743
5:       ID2 2016-07-01 2016-07-01  2632
Erdem Akkas
  • 2,062
  • 10
  • 15