Massive group variables: Move column data ONE ROW UP within the group EFFICIENTLY [Panel data]

Question

short questions: I want to move the data in one column one row up by group: within the group, the first row is replaced by second row data, second replace by third etc. The last row of the new column is 0. I have 40,000 grouping levels and 230,000 row in total.

Long questions: I have a panel data regarding consumer purchase, multiple users purchase at multiple times. The group variable is the consumer with brand choice data. I want to replicate a second choice data to calculate the repurchase rate for a certain brand of the same user. To define the repurchase : the repurchase of the "1st purchase" is the "2nd purchase"; And the repurchase of the "2nd purchase" is the "3rd purchase" The last purchase has no repurchase.

My code takes me 8 minutes for this simple data manipulations in r, but only 1 second in Excel.

Data is like(sorted by day shopper first and day):

    Day Shopper Choice
    1   A   Coke
    2   A   Coke
    1   B   Sprite
    1   C   Coke
    2   C   Pepsi
    3   C   Coke
    1   D   Sprite
    2   D   Sprite

The desired output:

Day Shopper Choice  choice 2
 1  A   apple        *apple*
 2  A   apple           0
 1  B   Banana          0
 1  C   apple          Banana
 2  C   Banana         apple 
 3  C   apple           0
 1  D   berry           *berry*
 2  D   berry           0

My original code is

# sort the data by user first and then by day
# choice.2 is the new column name
n<-nrow(dt) 
for (i in 1:n) {
 if (df$shopper[i]==dt$shopper[i+1]) 
{choice.2[i]<-choice[i+1]}
 else {choice.2[i]<-0}}

`dplyr::lead` or `data.table::shift` – alistaire Jul 10 '16 at 02:35 — alistaire, Jul 10 '16 at 02:35

score 2 · Accepted Answer · answered Jul 10 '16 at 03:23

If you mean that you want to group by shopper and then lead Choice, filling with zeros, with dplyr,

library(dplyr)

df %>% group_by(Shopper) %>% mutate(choice2 = lead(as.character(Choice), default = '0'))
## Source: local data frame [8 x 4]
## Groups: Shopper [4]
## 
##     Day Shopper Choice choice2
##   <int>  <fctr> <fctr>   <chr>
## 1     1       A  apple   apple
## 2     2       A  apple       0
## 3     1       B Banana       0
## 4     1       C  apple  Banana
## 5     2       C Banana   apple
## 6     3       C  apple       0
## 7     1       D  berry   berry
## 8     2       D  berry       0

or with data.table, something like

library(data.table)

setDT(df)[, choice2 := shift(as.character(Choice), type = 'lead', fill = '0') , by = Shopper][]
##    Day Shopper Choice choice2
## 1:   1       A  apple   apple
## 2:   2       A  apple       0
## 3:   1       B Banana       0
## 4:   1       C  apple  Banana
## 5:   2       C Banana   apple
## 6:   3       C  apple       0
## 7:   1       D  berry   berry
## 8:   2       D  berry       0

or in base,

df$choice2 <- ave(as.character(df$Choice), df$Shopper, FUN = function(x){c(x[-1], '0')})
df
##   Day Shopper Choice choice2
## 1   1       A  apple   apple
## 2   2       A  apple       0
## 3   1       B Banana       0
## 4   1       C  apple  Banana
## 5   2       C Banana   apple
## 6   3       C  apple       0
## 7   1       D  berry   berry
## 8   2       D  berry       0

If Choice is a factor, all versions coerce choice2 to character instead, which comes with some time penalties. If you add "0" to the factor levels, the same approaches should work without coercion.

Massive group variables: Move column data ONE ROW UP within the group EFFICIENTLY [Panel data]

1 Answers1