0

short questions: I want to move the data in one column one row up by group: within the group, the first row is replaced by second row data, second replace by third etc. The last row of the new column is 0. I have 40,000 grouping levels and 230,000 row in total.

Long questions: I have a panel data regarding consumer purchase, multiple users purchase at multiple times. The group variable is the consumer with brand choice data. I want to replicate a second choice data to calculate the repurchase rate for a certain brand of the same user. To define the repurchase : the repurchase of the "1st purchase" is the "2nd purchase"; And the repurchase of the "2nd purchase" is the "3rd purchase" The last purchase has no repurchase.

My code takes me 8 minutes for this simple data manipulations in r, but only 1 second in Excel.

Data is like(sorted by day shopper first and day):

    Day Shopper Choice
    1   A   Coke
    2   A   Coke
    1   B   Sprite
    1   C   Coke
    2   C   Pepsi
    3   C   Coke
    1   D   Sprite
    2   D   Sprite

The desired output:

Day Shopper Choice  choice 2
 1  A   apple        *apple*
 2  A   apple           0
 1  B   Banana          0
 1  C   apple          Banana
 2  C   Banana         apple 
 3  C   apple           0
 1  D   berry           *berry*
 2  D   berry           0

My original code is

# sort the data by user first and then by day
# choice.2 is the new column name
n<-nrow(dt) 
for (i in 1:n) {
 if (df$shopper[i]==dt$shopper[i+1]) 
{choice.2[i]<-choice[i+1]}
 else {choice.2[i]<-0}}
Community
  • 1
  • 1
MLE
  • 1,033
  • 1
  • 11
  • 30

1 Answers1

2

If you mean that you want to group by shopper and then lead Choice, filling with zeros, with dplyr,

library(dplyr)

df %>% group_by(Shopper) %>% mutate(choice2 = lead(as.character(Choice), default = '0'))
## Source: local data frame [8 x 4]
## Groups: Shopper [4]
## 
##     Day Shopper Choice choice2
##   <int>  <fctr> <fctr>   <chr>
## 1     1       A  apple   apple
## 2     2       A  apple       0
## 3     1       B Banana       0
## 4     1       C  apple  Banana
## 5     2       C Banana   apple
## 6     3       C  apple       0
## 7     1       D  berry   berry
## 8     2       D  berry       0 

or with data.table, something like

library(data.table)

setDT(df)[, choice2 := shift(as.character(Choice), type = 'lead', fill = '0') , by = Shopper][]
##    Day Shopper Choice choice2
## 1:   1       A  apple   apple
## 2:   2       A  apple       0
## 3:   1       B Banana       0
## 4:   1       C  apple  Banana
## 5:   2       C Banana   apple
## 6:   3       C  apple       0
## 7:   1       D  berry   berry
## 8:   2       D  berry       0

or in base,

df$choice2 <- ave(as.character(df$Choice), df$Shopper, FUN = function(x){c(x[-1], '0')})
df
##   Day Shopper Choice choice2
## 1   1       A  apple   apple
## 2   2       A  apple       0
## 3   1       B Banana       0
## 4   1       C  apple  Banana
## 5   2       C Banana   apple
## 6   3       C  apple       0
## 7   1       D  berry   berry
## 8   2       D  berry       0

If Choice is a factor, all versions coerce choice2 to character instead, which comes with some time penalties. If you add "0" to the factor levels, the same approaches should work without coercion.

alistaire
  • 42,459
  • 4
  • 77
  • 117