1

I want to calculate the number of months since the last month with purchase. And my dataframe is like this:

df
id month purchases
1  1     3
1  2     0
1  3     0
1  4     1
2  1     1
2  2     0
2  3     3
2  4     1
omit 100 rows

I want to use for loop to get the data frame like this:

id month purchases recency
1  1     3          NA
1  2     0          1
1  3     0          2
1  4     1          3
2  1     1          NA
2  2     0          1
2  3     3          2
2  4     1          1
omit 100 rows
realr
  • 3,652
  • 6
  • 23
  • 34
Zoey Ying
  • 19
  • 2

2 Answers2

1

Getting the recency for purchases != 0 is the difficult part. One way using dplyr could be

library(dplyr)

df %>%
  group_by(id, group = cumsum(purchases != 0)) %>%
  mutate(recency = month - first(month)) %>%
  ungroup() %>%
  select(-group) %>%
  group_by(id) %>%
  mutate(recency = ifelse(recency == 0, lag(recency) + month - lag(month), recency))

#     id month purchases recency
#  <int> <int>     <int>   <int>
#1     1     1         3      NA
#2     1     2         0       1
#3     1     3         0       2
#4     1     4         1       3
#5     2     1         1      NA
#6     2     2         0       1
#7     2     3         3       2
#8     2     4         1       1

To explain it better we first group_by id and purchases != 0 and for each group create recency column by subtracting month with first(month) of each group which gives

df %>%
  group_by(id, group = cumsum(purchases != 0)) %>%
  mutate(recency = month - first(month))

#   id month purchases group recency
#  <int> <int>     <int> <int>   <int>
#1     1     1         3     1       0
#2     1     2         0     1       1
#3     1     3         0     1       2
#4     1     4         1     2       0
#5     2     1         1     3       0
#6     2     2         0     3       1
#7     2     3         3     4       0
#8     2     4         1     5       0

This is almost what we want except that for the same id where purchases != 0 we need to subtract it by recent non-0 value which we achieve by using another group_by id and ifelse.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • I put the code in RStudio and it says that "There were 50 or more warnings (use warnings() to see the first 50)" – Zoey Ying Aug 11 '19 at 04:45
  • @ZoeyYing ok..those are warnings and not error. Can you check the output with the above method? Does it give you what you expect? – Ronak Shah Aug 11 '19 at 05:09
  • @ZoeyYing You might have `NA`s in your dataframe. Can you confirm which column has `NA`s and . what you want to do with them? – Ronak Shah Aug 11 '19 at 13:08
1

I see you wanted an answer with for-loops. Here's one:

months_since_last_purchase <- function(df) {

  df$recency <- NA           # create an empty vector to store recency
  months_since = 0           # initialise our months since counter to zero

  for(row in 1:nrow(df)){    # loop through our rows

    if(df$purchases[row] == 0){  # if we did not purchase something this month

      months_since = months_since + 1   # increment months_since
      df$recency[row] <- months_since   # set the recency to months since

    } else {                     # else if we did purchase something this month

      months_since = months_since + 1   # increment months_since
      if(months_since == 1){   #     and if we purchased something last month as well
        df$recency[row] = NA   #         set the recency to NA
      }else{                   #     else we didn't purchase something last month
        df$recency[row] <- months_since    # set the recency to the months_since
      }
      months_since = 0         # reset the months since to zero

    }
  }
  df                           # return the modified dataframe
}

To use this function we just created, on your df, use something like this:

new_df <- months_since_last_purchase(df)

If I plan to reuse this function I will save it somewhere such as a directory called scripts and to reuse it I would use:

source("scripts/months_since_last_purchase.R") 

Output:

     id month purchases recency
1     1     1         3      NA
2     1     2         0       1
3     1     3         0       2
4     1     4         1       3
5     2     1         1      NA
6     2     2         0       1
7     2     3         3       2
8     2     4         1      NA

R often frowns on for-loops as vector operations are faster and more elegant, but I still find for-loops handy when speed is not important.

indubitably
  • 297
  • 2
  • 7
  • I try to run this code in my RStudio,but the output of new df only shows the old data frame. – Zoey Ying Aug 11 '19 at 11:21
  • Apologies perhaps I should have been more explicit. My code creates a function called `months_since_last_purchase()` which can use by just copying, pasting and hitting enter that will create the function, then to use it to covert your df use: `new_df <- months_since_last_purchase(df) `. Alternatively, just cut and paste everything but the first and last lines and it will work. – indubitably Aug 13 '19 at 08:03