3

I am using XGBoost in order to do a sales forecasting. I need a custom objective function, as the value of the prediction depends on the sales price of an item. I am struggling to feed in the sales price into the loss function next to the labels and predictions. This is my approach:

def monetary_value_objective(predt: np.ndarray, dtrain: Union[xgb.DMatrix, np.ndarray]) -> Tuple[np.ndarray, np.ndarray]:
  """
  predt = model prediction
  dtrain = labels 
  Currently, dtrain is a numpy array.
  """

  y = dtrain

  mask1 = predt <= y  # Predict too few
  mask2 = predt > y  # Predict too much

  price = train[0]["salesPrice"]

  grad = price **2 * (predt - y)  
  # Gradient is negative if prediction is too low, and positive if it is too high
  # Here scale it (0.72 = 0.6**2 * 2)
  grad[mask1] = 2 * grad[mask1]
  grad[mask2] = 0.72 * grad[mask2]

  hess = np.empty_like(grad)
  hess[mask1] = 2 * price[mask1]**2
  hess[mask2] = 0.72 * price[mask2]**2

  grad = -grad

  return grad, hess

I get the following error when hyperparameter tuning:

[09:11:35] WARNING: /workspace/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
  0%|          | 0/1 [00:00<?, ?it/s, best loss: ?]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-34-2c64dc1b5a76> in <module>()
      1 # set runtime environment to GPU at: Runtime -> Change runtime type
----> 2 trials, best_hyperparams = hyperpara_tuning(para_space)
      3 final_xgb_model = trials.best_trial['result']['model']
      4 assert final_xgb_model is not None, "Oooops there is no model created :O "
      5 

17 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/indexers.py in check_array_indexer(array, indexer)
    399         if len(indexer) != len(array):
    400             raise IndexError(
--> 401                 f"Boolean index has wrong length: "
    402                 f"{len(indexer)} instead of {len(array)}"
    403             )

IndexError: Boolean index has wrong length: 1 instead of 136019

Does anyone have an idea how to use the sales price in the objective function? Is this possible at all?

Thanks!

Nicolas
  • 31
  • 2

3 Answers3

1

A bit late, but this answers the OP, https://datascience.stackexchange.com/questions/74780/how-to-implement-custom-loss-function-that-has-more-parameters-with-xgbclassifie

You use a function to return a function that keeps the same callback signature but the callback can use the parent function's data.

Chris
  • 340
  • 2
  • 10
0

You can use weights vector in your custom objective function, if you encode your external variable into weights distribution it could work, but I don't know if weights itself are only used in objective function inself or mayby also at level of data sampling, if so you would obtain much more complicated situation...

Qbik
  • 5,885
  • 14
  • 62
  • 93
0

You can use closures to pass desired values from the environment while keeping the required objective functions signature. Here is an R example,

#Define a closure

objectiveShell<-function(original.d){
    myobjective <- function(preds, dtrain) 
    {
        extradata = original.d$some_additional_data
          labels <- getinfo(dtrain, "label")
          grad <- (preds-labels)   + extradata  
          hess <- rep(1, length(labels))                
          return(list(grad = grad, hess = hess))
     }
   }


   # Model Parameter
   param1 <- list(booster = 'gbtree'
                       , learning_rate = 0.1

                         # This is how you use the closure
                       , objective = objectiveShell(DESIRED_DATA_FRAME) 
                       , eval_metric = evalerror
                       , set.seed = 2020)
        
        # Train Model
        xgb1 <- xgb.train(params = param1
                          , data = dtrain
                          , nrounds = 5
                          , watchlist
                          , maximize = FALSE
                          , early_stopping_rounds = 5)

A python example can be found here: https://datascience.stackexchange.com/questions/74780/how-to-implement-custom-loss-function-that-has-more-parameters-with-xgbclassifie

AShojaee
  • 11
  • 1
  • 3