0

My dataframe contains a column with various touch points, numbers 1 till 18. I want to know which touch point results in touch point 10. Therefore I want to create a new column which shows the touch point which occurred before touch point 10 per customer journey (PurchaseID). If touch point 10 doesn't occur in a customer journey the value can be NULL or 0. So for example:

dd <- read.table(text="
PurchaseId  TouchPoint DesiredOutcome
1           8          6
1           6          6
1           10         6
2           12         0
2           8          0
3           17         4
3           3          4
3           4          4
3           10         4", header=TRUE)

The complete dataset contains 2.500.000 observations. Does anyone know how to solve my problem? Thanks in advance.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • This is pretty unclear. What have you tried so far? What's the output you're trying to get? – camille May 06 '19 at 16:02
  • The output I'm trying to get is the column 'DesiredOutcome'. I have tried some codes with lag duplication and loops, but that didn't work for me. However, my R skills aren't that advanced. – Freek Spithoven May 07 '19 at 12:14
  • Even if your code doesn't work, it's probably better to post something. [See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R question that folks can help with. That includes a sample of data, all necessary code, and a clear explanation of what you're trying to do and what hasn't worked. – camille May 07 '19 at 13:37

1 Answers1

-1

Firstly, it is better to give a complete reproducible sample code. I suggest you look at the data.table library which is nice for handling large datasets.

library(data.table)
mdata <- matrix(sample(x = c(1:20, 21), size = 15*10, replace = TRUE), ncol = 10)
mdata[mdata==21] <- NA
mdata <- data.frame(mdata)
names(mdata) <- paste0("cj", 1:10)
df_touch <- data.table(mdata)

# -- using for
res <- rep(0, nrow(df_touch))
for( i in 1:10){
        cat(i, "\n")
        res[i] <- i*df_touch[, (10 %in% get(paste0("cj", i)))]
        cat(res[i], "\n")
}

# -- using lapply
dfun <- function(x, k = 10){ return( k %in% x ) }
df_touch[, lapply(.SD, dfun)]
cbo
  • 1,664
  • 1
  • 12
  • 27