4.5. Subtle tricks for ML#

4.5.1. Enabling categorical data support in XGBoost#

XGBoost has an experimental but very powerful support for categorical features. The only requirement is that you convert the features to Pandas’ category data type before feeding them to XGBoost👇

4.5.2. XGBoost builtin-in encoder vs. OneHotEncoder#

OneHotEncoder is 7 times worse than the encode that comes with XGBoost. Below is a comparison of OneHotEncoder from sklearn and the built-in XGBoost encoder.

As can be seen, the RMSE score is 7 times worse when OneHotEncoder was pre-applied on the data👇

4.5.3. Switch the APIs in XGBoost#

If you use the Scikit-learn API of XGBoost, you might lose some of the advantages that comes with its core training API.

For example, the models of the training API enable you to calculate Shapley values on GPUs, a feature that isn’t availabe in XGBRegressor or XGBClassifier.

Here is how you can get around this problem by extracting the booster object👇

4.5.4. Hyperparameter tuning for multiple metrics with Optuna#

It is a giant waste if you are hyperparameter tuning for multiple metrics in separate sessions.

Optuna allows you to create tuning sessions that enables you to tune for as many metrics as you want. Inside your Optuna objective function, simply measure your model using the metrics you want like precision, recall and logloss and return them separately.

Then, when you initialize a study object, specify whether you want Optuna to minimize or maximize each metric by providing a list of values to “directions”.