partikit predict() returns less rows than input data with missing predictor values

Question

I'm having a problem with partikit weighted conditional tree models trained on data with missing values.

I'm manually creating a bagged tree model by giving different integer weights to observations at each cycle.

But when I used the bootstrapped models to make predictions, I noticed that some of them were returning less values than the input data rows. Interestingly, out of 299 rows in the input data, the predicted data length was either 299 or 289. 289 is the number of rows after removing predictors with missing data.

Digging down the problem I found that it arises from the interaction of three components:

Using weights in the model;
Having missing data in the predictors;
Using character variables instead of factors in the input data passed to predict()

If only one of these three conditions is missing the problem doesn't arise and all trees return 299 values.

Here is the data: https://www.dropbox.com/s/98oriv2msce4wu5/anonym_data.rds?dl=0 Here is a script to reproduce the problem: https://www.dropbox.com/s/5y7g2dwt2838pbp/test.R?dl=0

It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. — MrFlick, Sep 02 '20 at 19:46
I tried, by putting random missings in the iris dataset, with a pattern similar to my data but I can't reproduce it — Bakaburg, Sep 03 '20 at 11:33
Hard to say what is going on without a reproducible example. Also regarding the weights: Note that `ctree()` treats the `weights` argument as case weights (also known as frequency) weights and hence expects non-negative integer weights. This seems to conflict with your usage if I understand it correctly. — Achim Zeileis, Sep 03 '20 at 21:58
To make a reproducible example I would need to provide the data, which is not mine to give, I'll try to anonymize it. Regarding the weight, so fractional weights are not allowed? the sum up to the actual number of rows. How is ctree interpreting them then? (I used fractional exponential weights to perform a Bayesian bootstrap) — Bakaburg, Sep 04 '20 at 09:52
I removed the fractional weights and turned them into integer ones, but still I get an alternation of 286 and 299 predicted rows. I'll update the question with the full code and the data... — Bakaburg, Sep 04 '20 at 10:22
Dear @AchimZeileis I think I found the problem. I updated the question. Please let me know if I'm making a fundamental mistake with the fractional weights, that would quite a bigger problem. If you prefer I can make a new question on stats exchange. — Bakaburg, Sep 04 '20 at 12:14
Please boil down the question and the example to something that is more easily intelligible. I think SO readers do not profit from the four extensive updates. Instead I would delete everything that is not needed and just retain a minimal reproducible example. My feeling would be that this should require complex data and code from a Dropbox account. Regarding the change in handling of character variables: This might well be cause by the changed default of stringsAsFactors in base R, starting from R 4.0.0. — Achim Zeileis, Sep 04 '20 at 15:35
I rewrote the question focusing it after my latest discoveries; to simplify it even further I used simple bootstrap this time. I'll open another question about the weights. — Bakaburg, Sep 04 '20 at 17:10

score 1 · Answer 1 · answered Feb 23 '22 at 05:10

1

The links no longer work, but I think you meant partykit. Even though ctree models can deal with missing data, there seem to be difficulties with the use of predict.party. The code uses a call to model.frame with the default na.action to na.fail.

I'm not good enough to say whether that's a bug, but it seems strange to me, and will likely fix the issue you are seeing. You can download the partykit source code, modify this line, adding the option na.action = na.pass.

Although I hope you are not still having this issue 1y 5m in the future.

answered Feb 23 '22 at 05:10

malavv

553
1
5
14

thanks!! I think I solved it with time, but now I really can't remember how... – Bakaburg Feb 23 '22 at 13:57
what was strange is that if weights are not used the problem would not arise – Bakaburg Feb 23 '22 at 13:57

partikit predict() returns less rows than input data with missing predictor values

1 Answers1