Suppose I have a R data frame. Each row represents a transaction made by someone on a specific date. There are many columns that have more information about the transaction,such as the money he/she spent and the number of commodities he/she bought. It is possible that one person has many transactions so that one person may have several rows. Say that I want to have a column that records how much the customer spent in last transaction. For now, I am using a for loop to look through the entire data frame to check if this customer has a previous transaction. If the customer has a previous transaction, then I add value to the field; if not, I just jump to the next row. It works but I am dealing with a data frame with more than 1.7 million rows so that the for loop is too slow for me. Do you have any better ideas solving the problem? Appreciate!!!
Asked
Active
Viewed 163 times
0
-
4It's easier to help you if you provide some sort of [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data and the desired output so that possible solutions can be tested. It's doesn't have to be full scale but at least having something to play with is useful and less abstract. – MrFlick Jul 05 '17 at 21:07
1 Answers
0
Assuming you have a data.frame
like this
library(dplyr)
df <- read.table(header = TRUE, text = "
CustId transaction Spent
ID1 1 500
ID1 2 706
ID1 3 925
ID1 4 1195
ID1 5 1472
ID1 6 1413
ID2 1 950
ID2 2 1203
ID2 3 1486
ID2 4 1658")
df%>%group_by(CustId)%>%dplyr::mutate(Last=Spent[transaction==max(transaction)])
CustId transaction Spent Last
<fctr> <int> <int> <int>
1 ID1 1 500 1413
2 ID1 2 706 1413
3 ID1 3 925 1413
4 ID1 4 1195 1413
5 ID1 5 1472 1413
6 ID1 6 1413 1413
7 ID2 1 950 1658
8 ID2 2 1203 1658
9 ID2 3 1486 1658
10 ID2 4 1658 1658

BENY
- 317,841
- 20
- 164
- 234
-
My understanding is that OP is interested in previous transaction's amount : `df%>%group_by(CustId)%>%mutate(Last=lag(Spent))` – HubertL Jul 05 '17 at 22:44
-