5

I have historical purchase data for some 10k customers for 3 months, I want to use that data for making predictions about their purchase in next 3 months. I am using Customer ID as input variable, as I want xgboost to learn for individual spendings among different categories. Is there a way to tweak, so that emphasis is to learn more based on the each Individual purchase? Or better way of addressing this problem?

muni
  • 1,263
  • 4
  • 22
  • 31

2 Answers2

2

It's arrived with XGBoost 1.3.0 as of the date of 10 December 2020, with the name of feature_weights : https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.XGBClassifier.fit , I'll edit here when I can work/see a tutorial with it.

1

You can use weight vector which you can pass in weight argument in xgboost; a vector of size equal to nrow(trainingData). However This is generally used to penalize mistake in classification mistake (think of sparse data with items which just sale say once in month or so; you want to learn the sales then you need to give more weight to sales instance or else all prediction will be zero). Apparently you are trying to tweak weight of independent variable which I am not able to understand well.

Learning the behavior of dependent variable (sales in your case) is what machine learning model do, you should let it do its job. You should not tweak it to force learn from some feature only. For learning purchase behavior clustering type of unsupervised techniques will be more useful.

To include user specific behavior first take will be to do clustering and identify under-indexed and over-indexed categories for each user. Then you can create some categorical feature using these flags.

PS: Some data to explain your problem can help others to help you better.

abhiieor
  • 3,132
  • 4
  • 30
  • 47
  • 2
    I think the request is to weight features not individuals, ie ensure models include easier some variables related to amounts in place of other variables such as number of purchases or simple product possession for instance. – Eric Lecoutre Aug 01 '16 at 09:52
  • As Eric Mentioned above, the objective is I want to give some weight to the Users, so that the features xgboost learns, also has some say of the individual trend, rather than all generalised trend across group of users. – muni Aug 01 '16 at 10:45
  • yes..I tried to answer both questions...weights possible on record but not on feature..putting some weight of feature will make model less powerful ie. if you know first breakage point of the tree (most imp one); splitting data by it and then learning will be good..trees are there to remove high variance of single trees like CART..forcing variable will be inducing high variance.. One cannot do the scaling and make one variable imp than other in regression way here because trees are scale independent...I was trying to say against thinking of feature imp that way.Still puting sample data helps. – abhiieor Aug 01 '16 at 11:56
  • I will try clustering maybe – muni Aug 01 '16 at 12:11
  • I want to combine the individual and the generalised trend for prediction – muni Aug 01 '16 at 12:48