1

I have an xgboost model on two different servers - a test server and a production server. Each server has exactly the same data and exactly the same code, but when I apply the same model to the same data in each environment I get a slightly different result. We need the results to be identical.

I've found that the sparse matrix object that the following line returns is different on each server:

mm <- sparse.model.matrix(y ~ ., data = df.new)[,-1]

The mm on the test server has @i and @x of length 182, whereas the mm on the production server has @i and @x of length 184. Again, I've compared the df.new from both servers and they are identical.

I've tried downgrading the Matrix package on the production server so that the versions match, but it's still producing different results. The only idea I have left is to match the versions of every package.

Does anyone have any suggestions for what might be happening? Unfortunately I can't share the data, but if it helps, it's 227 variables of mixed types (775 when converted to sparse model matrix). A lot of the variables are mostly 0.

I don't know if it makes a difference or not, but the test server is Windows and the production server is Linux.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
user123965
  • 169
  • 8
  • Is the `xgboost` model loaded on to the servers or created each time? It can have some sampling in the model, so with the same code and data, unless controlling the seed it could product different model outputs – Jonny Phelps Jan 24 '20 at 15:16
  • the model is loaded, so the output should be the same. If I take the smm from each server and run them through the model on my local machine, I get the same issue (i.e. each predicted value is slightly different - consistently so) – user123965 Jan 24 '20 at 15:19
  • I have an ordered factor variable, `stry` which can have values: 1, 1.5, 2, 2.5, 3. The new data has an observation with `stry=2`, but once converted to sparse matrix the test server has `stry.L = 0.00e+00` and `stry.C=-4.09e-16` whereas the production server has `stry.L = -3.51e-17` and `stry.C=1.75e-16`. I don't really understand where those numbers come from (I would have thought they should be binaries), but could there be a difference in how `sparse.model.matrix' treats very small numbers on Linux vs. Windows? – user123965 Jan 24 '20 at 15:31
  • Yeah it could be differences in the OS. Does it have a large impact on the model predictions? These numbers are still really small, whether its `-4.09e-16` or `1.75e-16`, I wouldn't have thought that would make any noticeable difference in prediction error – Jonny Phelps Jan 24 '20 at 15:51
  • The difference would be small enough to not concern me in general. However, for various business reasons an observation with unchanged data but a different predicted value would cause many issues. – user123965 Jan 24 '20 at 16:07

1 Answers1

1

You're getting bitten by the conjunction of two problems:

(1) floating-point computations are inherently sensitive to small differences (platform, compiler, compiler settings ...) (2) ordered factors in R use an orthogonal polynomial contrasts (see ?contr.poly, Venables and Ripley Modern Applied Statistics with S, or here), which involve floating-point computation.

dd <- data.frame(x=ordered(0:2))
> Matrix::sparse.model.matrix(~x,dd)
3 x 3 sparse Matrix of class "dgCMatrix"
  (Intercept)           x.L        x.Q
1           1 -7.071068e-01  0.4082483
2           1 -7.850462e-17 -0.8164966
3           1  7.071068e-01  0.4082483

You can see that one of the entries here is close to but not exactly equal to zero. So far I haven't actually been able to come up with an example that displays a difference between the two platforms I have handy (Ubuntu Linux and MacOS), but this is almost surely the source of your problem; the nearly-zero entry is computed as exactly zero on one platform but not the other.

There is probably no perfect solution to this problem, but zapsmall() would convert small entries to zero, and drop0 would convert them from explicit to implicit (structural) zero entries, so drop0(zapsmall(mm)) might work ...

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453