2

SITUATION

When I plot xgboost.plot_tree I get a bunch of empty characters/boxes/blocks on the graph only instead of the titles, labels and numbers. I use more than 400 features so that can be a contributing factor for this.

CODE 1

fig, ax = plt.subplots(figsize=(170, 170))
plot_tree(xgbmodel, ax=ax)
plt.savefig("temp.pdf")
plt.show()

CODE 2

plot_tree(xgbmodel, num_trees=2)
fig = plt.gcf()
fig.set_size_inches(150, 100)
fig.savefig('tree.png')

ERROR

  • both code 1 and code 2 results the same image
  • This is is just a crop of the whole tree because that is much bigger so I would not be able to upload here, but the tree shape look perfect.

enter image description here

SOLUTIONS I have Tried

  • This has problem with plotting, I can plot without any problem - Plot a Single XGBoost Decision Tree
  • This has other issues - xgboost.plot_tree: binary feature interpretation
  • I have plotted the code that @jared_mamrot has given to me and it have brought the same error, I have restarted and cleaned my environment and run this fist and only, in the same notebook. enter image description here
  • GitHub Recommendation this model.get_booster().get_dump(dump_format='text') printed a out a bit more than 200'000 characters = 63 A4 size pages of 11size fonts of Calibri, that looks perfectly correct ex.: 0.0268656723\n\t\t\t\t\t34:[f0<6.5] yes=53,no=54,missing=53\n\t\t\t\t\t\. Is it possible that I have this issue because it can not display so much text in such a normal size graph?
sogu
  • 2,738
  • 5
  • 31
  • 90
  • 1
    I had a similar problem on my ubuntu machine, and solved it by installing fontconfig: https://howtoinstall.co/en/fontconfig – matthiash Jul 02 '21 at 21:07

2 Answers2

1

I wasn't able to reproduce your error. Can you please add more details to your question and confirm that this code works? link to pima-indians-diabetes.csv

#!/usr/bin/env python3

# plot decision tree
from numpy import loadtxt
from xgboost import XGBClassifier
from xgboost import plot_tree
import matplotlib.pyplot as plt
import graphviz

# load data
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")

# split data into X and y
X = dataset[:,0:8]
y = dataset[:,8]

# fit model no training data
model = XGBClassifier()
model.fit(X, y)

# plot/save fig
fig, ax = plt.subplots(figsize=(170, 170))
plot_tree(model, ax=ax)
plt.savefig("test.pdf")

Edit per comment:

I can't reproduce this issue/error. No matter which package version / char encoding / line endings / etc my notebook always renders the text correctly. The only thing I can suggest is installing a new virtual environment (e.g. miniconda) with current versions of the required packages (conda install notebook numpy matplotlib xgboost graphviz python-graphviz) and testing it again.

Also, make sure you don't have windows line endings (see: Matplotlib plotting some characters as blank square / https://github.com/jupyterlab/jupyterlab/issues/1104 / https://github.com/jupyterlab/jupyterlab/issues/3718 / https://github.com/jupyterlab/jupyterlab/pull/3882 ) and specify the font you are using (e.g. How to change fonts in matplotlib (python)?):

# plot decision tree
from numpy import loadtxt
from xgboost import XGBClassifier
from xgboost import plot_tree
from matplotlib.font_manager import FontProperties
import matplotlib.pyplot as plt
import graphviz

# load data
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")

# split data into X and y
X = dataset[:,0:8]
y = dataset[:,8]

# fit model no training data
model = XGBClassifier()
model.fit(X, y)

# plot/save fig
prop = FontProperties()
prop.set_file('Arial.ttf')
fig, ax = plt.subplots(figsize=(170, 170))
plot_tree(model, ax=ax, fontproperties=prop)
plt.savefig("test.png")
fig.show()
jared_mamrot
  • 22,354
  • 4
  • 21
  • 46
1

I have moved my whole environment to a local machine from an AWS EC2 than it run perfectly. The AWS EC2 some other weird things like it wasn't allowing to use Extension in Jupyter Lab. Both of them are Ubuntu 20.04 LTS.

sogu
  • 2,738
  • 5
  • 31
  • 90