How do I get the column names for ALL the nth CONSECUTIVE occurrence of a value in a data table row using R

Question

I'm looking for a way, to determine "join", "churn", and "rejoin" dates for clients in a data frame (clients as row labels, month/year as column labels, earnings as values) where:

The first month with a non-zero earnings value for a client = join date
If there are 13 consecutive months with no earnings (zero value), then the 13th month is considered a "churn" month.
If, after a "churn" month has occurred, there is a non-zero value for that client, then the corresponding month in that column is the "rejoin" month.
If there is another 13 months with no earnings AFTER a rejoin month, then the last month in that block of 13 zero value columns is another churn month

I could probably hack this together with For loops (see pseudo-code below), but I'm wondering if there is a more elegant way to do it with apply functions or with the dplyr package.

So, data frame looks like this:

    > clients
                  Jan08 Feb08 Mar08 Apr08 May08 ... 
    client 1      1     0     0     0     0     ... 
    client 2      0     -6    6    -6     6     ... 
    client 3      28    4     108   93    3.85  ... 
                   ............

and I'm looking to run logic like this:

    For each client {
      For each month {
        If the value this month == 0 {
          If the previous 12 months values all == 0* {
            If there is a non-zero value exactly 13 months before {
              then mark the current month as a churn month
            }
          }
        }
        Else {
          If this is the first non zero value in the row {
            then mark the current month as the join month
          }
        Else {
          If this is the first non-zero value in 13 or more months {
            then mark this as a rejoin month
          }
        }
      }
    }

*NOTE: As with client 2, I could have negative values, so I can't just sum up values to determine if proceeding values are all zero.

Any direction on this would be much appreciated.

score 1 · Answer 1 · edited May 23 '17 at 11:52

I think the best approach to this problem is to change it to a long dataframe rather than a wide one. For example, it could look something like this:

clients <- data.frame(
  client = c(rep(1, 15)),
  month = rep(1:15, 1),
  activity = c(1, rep(0, 13), 1)
)

Which looks like this:

> clients
   client month activity
1       1     1        1
2       1     2        0
3       1     3        0
4       1     4        0
5       1     5        0
6       1     6        0
7       1     7        0
8       1     8        0
9       1     9        0
10      1    10        0
11      1    11        0
12      1    12        0
13      1    13        0
14      1    14        0
15      1    15        1

Once you have done this then you can treat each client seperately via dplyr groupby method. All that remains is to make use of window functions to process the logic you have provided.

To make use of window functions I use library(zoo) and the rollapply method. This approach is something I used in another solution here, where you can see it being used in a similar problem.

From here we can simply apply the logic that we're after. So an extremely messy version could look like this:

clients %>% group_by(client) %>%
  mutate(
    isChurn = rollapply(activity, 13, function(x) {
      all(x == 0)
    }, align="right", fill=NA), # if there are 13 consecutive months with no earnings

    isRejoin = rollapply(activity, 14, function(x) {
      all(x[1:13] == 0) & x[14] != 0
    }, align="right", fill=NA)
  )

Which has the output based on the my data.frame:

Source: local data frame [15 x 5]
Groups: client

   client month activity isChurn isRejoin
1       1     1        1      NA       NA
2       1     2        0      NA       NA
3       1     3        0      NA       NA
4       1     4        0      NA       NA
5       1     5        0      NA       NA
6       1     6        0      NA       NA
7       1     7        0      NA       NA
8       1     8        0      NA       NA
9       1     9        0      NA       NA
10      1    10        0      NA       NA
11      1    11        0      NA       NA
12      1    12        0      NA       NA
13      1    13        0   FALSE       NA
14      1    14        0    TRUE    FALSE
15      1    15        1   FALSE     TRUE

I'm not sure if for the churn month you only want to mark the first occurrence as a churn month or not. But I think this will get you there and you should be able to figure out the rest.

Minor simplification rollapply instances: `rollapplyr(activity == 0, 13, all, fill=NA)` and `rollapplyr(activity == 0, 14, function(x) all(x[-14]) && !x[14], fill = NA)` . — G. Grothendieck, Nov 03 '15 at 14:19
Having some issues getting this to run. I get a 'wrong sing in 'by' argument error, which seems to indicate that the data types are not working with grouping. I should note that in the actual data, the clients are "client 1", "client 2"... they have actual character string names, so even when I flatten out the table, the client column is of type character. — DTuffy, Dec 29 '15 at 21:30
If you could provide a reproducible example then I'll have a look. — chappers, Jan 03 '16 at 08:06

How do I get the column names for ALL the nth CONSECUTIVE occurrence of a value in a data table row using R

1 Answers1