short questions: I want to move the data in one column one row up by group: within the group, the first row is replaced by second row data, second replace by third etc. The last row of the new column is 0. I have 40,000 grouping levels and 230,000 row in total.
Long questions: I have a panel data regarding consumer purchase, multiple users purchase at multiple times. The group variable is the consumer with brand choice data. I want to replicate a second choice data to calculate the repurchase rate for a certain brand of the same user. To define the repurchase : the repurchase of the "1st purchase" is the "2nd purchase"; And the repurchase of the "2nd purchase" is the "3rd purchase" The last purchase has no repurchase.
My code takes me 8 minutes for this simple data manipulations in r, but only 1 second in Excel.
Data is like(sorted by day shopper first and day):
Day Shopper Choice
1 A Coke
2 A Coke
1 B Sprite
1 C Coke
2 C Pepsi
3 C Coke
1 D Sprite
2 D Sprite
The desired output:
Day Shopper Choice choice 2
1 A apple *apple*
2 A apple 0
1 B Banana 0
1 C apple Banana
2 C Banana apple
3 C apple 0
1 D berry *berry*
2 D berry 0
My original code is
# sort the data by user first and then by day
# choice.2 is the new column name
n<-nrow(dt)
for (i in 1:n) {
if (df$shopper[i]==dt$shopper[i+1])
{choice.2[i]<-choice[i+1]}
else {choice.2[i]<-0}}