4.6. Ensembling#

4.6.1. Voting classifier/regressor#

How to reach democracy in machine learning? By using a voting ensemble!

Max voting is a common ensembling technique that uses the majority of vote to label new classification samples. If we have three models with the following predictions in a binary classification problem:

  • Model 1 -> class 1

  • Model 2 -> class 2

  • Model 3 -> class 1

The final prediction would be class 1. VotingClassifier of Sklearn can be used to build such an ensemble.

It takes a list of individual classifiers and ensembles them with the max voting technique when its “voting” parameter is set to “hard”. When it is set to “soft”, the ensemble uses predicted class probabilities and averages them and thresholds the result.

VotingRegressor is the same as VotingClassifier when its “voting” is set to “soft” and works for regression.

4.6.2. Stacking ensemble/regressor#

People use stacking to silently win competitions on Kaggle. How does it work?

As a rule, multiple performant models with as different learning functions as possible are chosen to form an ensemble. Then, using KFold cross-validation, predictions are generated for each model.

As an example, with 5 models in a stack doing a 5-fold CV on the data, we will have 25 columns of predictions. This concludes the level 1 of the stack.

In the next level, using these 25 columns of predictions as features, a final - meta estimator is trained with cross-validation and final predictions are made.

This leverages the strength of each individual model in the stack and uses their output as inputs to the final estimator. This helps greatly reduce bias in the predictions.

This complicated ensembling technique is implemented in its basic format in Sklearn as Stacking Classifier/Regressor. You pass a list of base estimators and one final lightweight meta estimator like Logistic Regression. Works just like any Sklearn model.