0

I'm trying to use the bartCause package to build an uplift model in R. Unfortunately I have trouble to integrate the data frame in the right way - error message:

$<-.data.frame`(`*tmp*`, "lift", value = c(0.159231848781688,  : 
  replacement has 160 rows, data has 2595

Code used:

 x = as.matrix(calibration[,-c(1:3)]) 
  y = calibration$churn
  z = calibration$treatment
    
  
  bart = bartc(y, z, x,
               method.trt = "bart", 
               method.rsp = "bart", 
               estimand="att", #average treatment effect on the treated
               n.samples = 20L, 
               n.chains = 8L, #Integer specifying how many independent tree sets and fits should be calculated.
               n.burn = 10L,
               n.threads = 4L, #Integer specifying how many threads to use for parallelization
               n.trees = 1000L,
               keepTrees = TRUE, #necessary for prediction!
               verbose = FALSE)

  pred_uplift <- predict(bart, validation[,-c(1:3)], combineChains = TRUE)
  pred <- pred_uplift
  validation$lift <- - pred[,1] + pred[,2]
  

calibration data: (2595 obs. of 15 variables) enter image description here

validation data: (2595 obs. of 15 variables) enter image description here

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Marcel
  • 1
  • 1
  • could you try to produce a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example)? Could you also provide the data with dput() so that others can recreate it, instead of an image? – Waldi Jul 18 '20 at 11:15

1 Answers1

0

Here is what I found about your code and the error message. The "predict" method in the package produces a matrix of (160 X N) where N is the number of cases in your validation dataset. Thus, each column in this matrix corresponds to a row in your validation dataset. You first need to transpose the matrix:

pred = t(pred_uplift)

Then, you can calculate the "lift" variable using whatever column you need from the "pred" matrix:

validation$lift = pred[,1] + pred[,2]

BTW: I have no idea about which column means what in the "pred" matrix and why you use the first two columns (I assume that you do), but the above code works.

Dharman
  • 30,962
  • 25
  • 85
  • 135
Haci Duru
  • 456
  • 3
  • 9
  • Thank you very much for the quick response! Indeed the code is working- unfortunately I do not know why there are so many columns. I am trying to predict the incremental effect of an incentive where Uplift = P(Y|X, T=0) - P(Y|X, Y=1) --> the probability of an individual to churn if no incentive was given minus the probability of that individual when the incentive was given. Thus I thought the output will have 2 columns (both probabilities for each individual). At least that was the case when I used a random forest algorithm for uplift modeling. – Marcel Jul 19 '20 at 07:08