8

I would like to check with you if my understanding about ensemble learning (homogeneous vs heterogeneous) is correct.

Is the following statement correct?

An homogeneous ensemble is a set of classifiers of the same type built upon different data as random forest and an heterogeneous ensemble is a set of classifiers of different types built upon same data.

If it's not correct, could you please clarify this point?

Andrew Myers
  • 2,754
  • 5
  • 32
  • 40
Zoya
  • 1,195
  • 2
  • 12
  • 14

2 Answers2

9

Homogeneous ensemble consists of members having a single-type base learning algorithm. Popular methods like bagging and boosting generate diversity by sampling from or assigning weights to training examples but generally utilize a single type of base classifier to build the ensemble.

On the other hand, Heterogeneous ensemble consists of members having different base learning algorithms such as SVM, ANN and Decision Trees. A popular heterogeneous ensemble method is stacking, which is similar to boosting.

This table contains examples for both homogeneous and heterogeneous ensemble models.

EDIT:

Homogeneous ensemble methods, use the same feature selection method with different training data and distributing the dataset over several nodes while Heterogeneous ensemble methods use different feature selection methods with the same training data.

Giorgos Myrianthous
  • 36,235
  • 20
  • 134
  • 156
1

Heterogeneous Ensembles (HEE) use different fine-tunes algorithms. They usually work well if we have a small amount of estimators. Note that the number of algorithms should always be odd (3+) in order to avoid ties. For example, we could combine a decision tree, a SVM and a logistic regression using a voting mechanism to improve the results. Then use combined wisdom through majority vote in order to classify a given sample. Besides voting, we can also use averaging or stacking to aggregate the results of the models.The data for each model is the same.

Homogeneous Ensembles (HOE), such as bagging work by applying the same algorithm on all the estimators. These algorithms should not be fine-tuned -> They should be weak ! In contrast to HEE we will use a large amount of estimators. Note that the datsets for this model should be separately sampled in order to guarantee independence. Furthermore, the datasets should be different for each model. This will allow us to be more precise when aggregating the results of each model. Bagging reduces variances as the sampling is truly random. Through using the ensemble itself, we can reduce the risk of over-fitting and we create a robust model. Unfortunately bagging is computationally expensive.

EDIT: Here an example in code

Heterogeneous Ensemble Function:

# Instantiate the individual models
clf_knn = KNeighborsClassifier(5)
clf_decision_tree= DecisionTreeClassifier()
clf_logistic_regression = LogisticRegression()

# Create voting classifier
clf_voting = VotingClassifier(
estimators=[
('knn', clf_knn),
('dt', clf_decision_tree),
('lr', clf_logistic_regression )])

# Fit it to the training set and predict
clf_voting.fit(X_train, y_train)
y_pred = clf_voting.predict(X_test)

Homogeneous Ensemble Function:

# Instantiate the base estimator, which is a weak model (we set max depth to 3)
clf_decision_tree = DecisionTreeClassifier(max_depth=3)

# Build the Bagging classifier with 5 estimators (we use 5 decision trees)
clf_bag = BaggingClassifier(
base_estimator=clf_decision_tree,
n_estimators=5
)

# Fit the Bagging model to the training set
clf_bag.fit(X_train, y_train)

# Make predictions on the test set
y_pred = clf_bag.predict(X_test)

Conclusion: In summary what you say is correct, yes.

DataBach
  • 1,330
  • 2
  • 16
  • 31