In R dataframe, how to search for previous transaction information

Question

Suppose I have a R data frame. Each row represents a transaction made by someone on a specific date. There are many columns that have more information about the transaction,such as the money he/she spent and the number of commodities he/she bought. It is possible that one person has many transactions so that one person may have several rows. Say that I want to have a column that records how much the customer spent in last transaction. For now, I am using a for loop to look through the entire data frame to check if this customer has a previous transaction. If the customer has a previous transaction, then I add value to the field; if not, I just jump to the next row. It works but I am dealing with a data frame with more than 1.7 million rows so that the for loop is too slow for me. Do you have any better ideas solving the problem? Appreciate!!!

It's easier to help you if you provide some sort of [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data and the desired output so that possible solutions can be tested. It's doesn't have to be full scale but at least having something to play with is useful and less abstract. — MrFlick, Jul 05 '17 at 21:07

BENY · Answer 1 · 2017-07-05T21:23:00.457

Assuming you have a data.frame like this

library(dplyr)
df <- read.table(header = TRUE, text = "
                  CustId  transaction      Spent
                  ID1 1     500
                  ID1 2     706
                  ID1 3     925
                  ID1 4     1195
                  ID1 5    1472
                  ID1 6    1413
                  ID2  1    950
                  ID2 2    1203
                  ID2 3    1486
                  ID2 4    1658")

df%>%group_by(CustId)%>%dplyr::mutate(Last=Spent[transaction==max(transaction)])


  CustId  transaction Spent  Last
   <fctr>       <int> <int> <int>
1     ID1           1   500  1413
2     ID1           2   706  1413
3     ID1           3   925  1413
4     ID1           4  1195  1413
5     ID1           5  1472  1413
6     ID1           6  1413  1413
7     ID2           1   950  1658
8     ID2           2  1203  1658
9     ID2           3  1486  1658
10    ID2           4  1658  1658

My understanding is that OP is interested in previous transaction's amount : `df%>%group_by(CustId)%>%mutate(Last=lag(Spent))` — HubertL, Jul 05 '17 at 22:44

In R dataframe, how to search for previous transaction information

1 Answers1