Questions tagged [xgboost]

XGBoost is a library for constructing boosted tree models in R, Python, Java, Scala, and C++. Use this tag for issues specific to the package (i.e., input/output, installation, functionality).

Before using the XGBoost tag, try to test whether your issue is related specifically to the functionality of XGBoost. Often, problems arise from the surrounding model-building environment (such as R's caret or Python's scikit-learn), the quality of the data being used, or purely statistical concerns that might belong on Cross Validated.

2788 questions
113
votes
12 answers

Process finished with exit code 137 in PyCharm

When I stop the script manually in PyCharm, process finished with exit code 137. But I didn't stop the script. Still got the exit code 137. What's the problem? Python version is 3.6, process finished when running xgboost.train() method.
shawe
  • 1,155
  • 2
  • 7
  • 3
87
votes
5 answers

How to save & load xgboost model?

From the XGBoost guide: After training, the model can be saved. bst.save_model('0001.model') The model and its feature map can also be dumped to a text file. # dump model bst.dump_model('dump.raw.txt') # dump model with feature…
Pengju Zhao
  • 1,439
  • 3
  • 14
  • 17
80
votes
21 answers

How to install xgboost in Anaconda Python (Windows platform)?

I am a new Python user. I downloaded the latest Anaconda 3 2.4.1 (Python 3.5) from the below link: https://www.continuum.io/downloads My PC Configurations are: Windows 10, 64 bit, 4GB RAM I have spent hours trying to find the right way to download…
Siddharth Vashishtha
  • 1,090
  • 2
  • 9
  • 15
68
votes
11 answers

How to get feature importance in xgboost?

I'm using xgboost to build a model, and try to find the importance of each feature using get_fscore(), but it returns {} and my train code is: dtrain = xgb.DMatrix(X, label=Y) watchlist = [(dtrain, 'train')] param = {'max_depth': 6, 'learning_rate':…
modkzs
  • 1,369
  • 4
  • 13
  • 17
62
votes
4 answers

XGBoost Categorical Variables: Dummification vs encoding

When using XGBoost we need to convert categorical variables into numeric. Would there be any difference in performance/evaluation metrics between the methods of: dummifying your categorical variables encoding your categorical variables from e.g.…
ishido
  • 4,065
  • 9
  • 32
  • 42
59
votes
10 answers

How can I implement incremental training for xgboost?

The problem is that my train data could not be placed into RAM due to train data size. So I need a method which first builds one tree on whole train data set, calculate residuals build another tree and so on (like gradient boosted tree do).…
Marat Zakirov
  • 905
  • 1
  • 8
  • 13
54
votes
1 answer

How does XGBoost do parallel computation?

XGBoost uses the method of additive training in which it models the residual of the previous model. This is sequential though, how does it to parallel computing then?
Cedric Oeldorf
  • 579
  • 1
  • 4
  • 10
52
votes
5 answers

XGBoost XGBClassifier Defaults in Python

I am attempting to use XGBoosts classifier to classify some binary data. When I do the simplest thing and just use the defaults (as follows) clf = xgb.XGBClassifier() metLearn=CalibratedClassifierCV(clf, method='isotonic', cv=2) metLearn.fit(train,…
Chris Arthur
  • 1,139
  • 2
  • 10
  • 11
49
votes
18 answers

How to install xgboost package in python (windows platform)?

http://xgboost.readthedocs.org/en/latest/python/python_intro.html On the homepage of xgboost(above link), it says: To install XGBoost, do the following steps: You need to run make in the root directory of the project In the python-package directory…
Robin1988
  • 1,504
  • 4
  • 20
  • 25
47
votes
2 answers

How is the feature score(/importance) in the XGBoost package calculated?

The command xgb.importance returns a graph of feature importance measured by an f score. What does this f score represent and how is it calculated? Output: Graph of feature importance
ishido
  • 4,065
  • 9
  • 32
  • 42
46
votes
6 answers

multioutput regression by xgboost

Is it possible to train a model by xgboost that has multiple continuous outputs (multi-regression)? What would be the objective of training such a model? Thanks in advance for any suggestions
user1782011
  • 875
  • 1
  • 7
  • 13
42
votes
3 answers

xgboost in R: how does xgb.cv pass the optimal parameters into xgb.train

I've been exploring the xgboost package in R and went through several demos as well as tutorials but this still confuses me: after using xgb.cv to do cross validation, how does the optimal parameters get passed to xgb.train? Or should I calculate…
snowneji
  • 1,086
  • 1
  • 11
  • 25
38
votes
1 answer

How do I use a TimeSeriesSplit with a GridSearchCV object to tune a model in scikit-learn?

I've searched the sklearn docs for TimeSeriesSplit and the docs for cross-validation but I haven't been able to find a working example. I'm using sklearn version 0.19. This is my setup import xgboost as xgb from sklearn.model_selection import…
cd98
  • 3,442
  • 2
  • 35
  • 51
37
votes
3 answers

GridSearchCV - XGBoost - Early Stopping

i am trying to do hyperparemeter search with using scikit-learn's GridSearchCV on XGBoost. During gridsearch i'd like it to early stop, since it reduce search time drastically and (expecting to) have better results on my prediction/regression task.…
ayyayyekokojambo
  • 1,165
  • 3
  • 13
  • 33
35
votes
4 answers

What is the difference between xgb.train and xgb.XGBRegressor (or xgb.XGBClassifier)?

I already know "xgboost.XGBRegressor is a Scikit-Learn Wrapper interface for XGBoost." But do they have any other difference?
Statham
  • 4,000
  • 2
  • 32
  • 45
1
2 3
99 100