Python: 3.6
Windows: 10
I have few question regarding Random Forest and problem at hand:
I am using Gridsearch to run regression problem using Random Forest. I want to plot the tree corresponding to best fit parameter that gridsearch has found out. Here is the code.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=55)
# Use the random grid to search for best hyperparameters
# First create the base model to tune
rf = RandomForestRegressor()
# Random search of parameters, using 3 fold cross validation,
# search across 100 different combinations, and use all available cores
rf_random = RandomizedSearchCV(estimator = rf, param_distributions = random_grid, n_iter = 50, cv = 5, verbose=2, random_state=56, n_jobs = -1)
# Fit the random search model
rf_random.fit(X_train, y_train)
rf_random.best_params_
The best parameter came out to be is:
{'n_estimators': 1000,
'min_samples_split': 5,
'min_samples_leaf': 1,
'max_features': 'auto',
'max_depth': 5,
'bootstrap': True}
How can I plot this tree using above parameter?
My dependent variable
y
lies in range [0,1] (continuous) and all predictor variables are either binary or categorical. Which algorithm in general can work well fot this input and output feature space. I tried with Random Forest. (Didn't give that good result). Note herey
variable is a kind of ratio therefore its between 0 and 1.Example: Expense on food/Total Expense
The above data is skewed that means the dependent or
y
variable has value=1
in 60% of data and somewhere between 0 and 1 in rest of data. like0.66, 0.87
so on.Since my data has only binary
{0,1}
and categorical variables{A,B,C}
. Do I need to convert it intoone-hot encoding
variable for using random forest?