6

I am working on BTYD R package and the problem is that the values of the probability that a customer is alive at the end of calibration are extremely high. Even observations with only one transaction in calibration period have this probability around 0.9999. I know that the parameter "s" (estimated by the package) is used in this calculation. My gamma is very low (almost 0). When I tried to change it manually for higher value the probabilities went down. Any idea how to deal with this problem? I attach my codes below.

   elog <- dc.MergeTransactionsOnSameDate(elog)
    end.of.cal.period <- min(elog$date)+as.numeric((max(elog$date)-min(elog$date))/2)

data <- dc.ElogToCbsCbt(elog, per="week", 
                        T.cal=end.of.cal.period,
                        merge.same.date=TRUE, 
                        statistic = "freq") 

cal2.cbs <- as.matrix(data[[1]][[1]])

## prameters estimation
params2 <- pnbd.EstimateParameters(cal2.cbs)

## log likehood
(LL <- pnbd.cbs.LL(params2, cal2.cbs))

p.matrix <- c(params2, LL)
for (i in 1:20) {
  params2 <- pnbd.EstimateParameters(cal2.cbs, params2)
  LL <- pnbd.cbs.LL(params2, cal2.cbs)
  p.matrix.row <- c(params2, LL)
  p.matrix <- rbind(p.matrix, p.matrix.row)
}

(params2 <- p.matrix[dim(p.matrix)[1],1:4])

# set up parameter names for a more descriptive result
param.names <- c("r", "alpha", "s", "beta")

LL <- pnbd.cbs.LL(params2, cal2.cbs)

# PROBABILITY A CUSTOMER IS ALIVE AT END OF CALIBRATION / TRAINING
x <- cal2.cbs["123", "x"]         # x is frequency
t.x <- cal2.cbs["123", "t.x"]     # t.x is recency, ie time of last transactions
T.cal <- 26 # week of end of cal, i.e. present
pnbd.PAlive(params2, x, t.x, T.cal)
Mila
  • 63
  • 6

1 Answers1

4

There is no "gamma" parameter being estimated - "s" and "beta" define the gamma distribution of dropout rate heterogeneity. I recommend editing your post to include the parameters, as well as the output of

pnbd.PlotDropoutRateHeterogeneity(params2)

Without seeing your parameter estimates or knowing the context of your data, there are at least two (not mutually exclusive) possibilities.

First, you could have very low (e.g., zero) dropout rate. If so, you can still fit a plain NBD model of transaction rate, and assume a zero dropout rate.

Second, you could be seeing the "increasing frequency paradox". From pages 17-19 of one of Peter Fader/Bruce Hardie's papers:

For low frequency customers, there is an almost linear relationship between recency and [expected transactions]. However, this relationship becomes highly nonlinear for high frequency customers. In other words, for customers who have made a relatively large number of transactions in the past, recency plays a much bigger role in determining [value] than for an infrequent past purchaser.

According to the authors, a customer such as you describe with few (or even just a single) transaction receive a high probability of being "alive" with less dependency on recency. This is because by definition, a low frequency customer can have long "gaps" between purchases. Therefore we should assign less risk to a lower frequency customer even if they have not transacted for some time. Compare this to a high frequency customer - the longer we go without seeing a transaction, the faster we should could conclude that the customer is "dead" since we know they would ordinarily being making many transactions.

Geoffrey
  • 196
  • 1
  • 12
  • 1
    Thank you for the answer, @Geoffrey! I will try to remove observations with few transactions and meanwhile I`m attaching my params2: 1.212466301 7.854141863 0.004670594 1.061439413. I have observations from one year, so calibration period lasts half a year. My pnbd.PlotDropoutRateHeterogeneity(params2) is here ![Valid XHTML](http://www.laurenscoster.com/hubfs/img/graph_hdp.png). – Mila Aug 06 '15 at 09:06
  • From the plot, it looks to me like you have a near zero drop out rate - note the mean of .004. It may be that your time period (half a year) isn't long enough to observe the rate at which your customers drop out. However, removing low transaction rate customers isn't the answer, as you'll simply lose the information contained in that part of your customer distribution. I suggest trying to fit a plain NBD model instead, with no drop out parameters. – Geoffrey Aug 07 '15 at 16:54
  • You are welcome. Please mark this question as solved. – Geoffrey Aug 13 '15 at 19:08