Gradient Boosting¶
The core gradient boosting model for regression and binary classification.
Basic Usage¶
import openboost as ob
model = ob.GradientBoosting(
n_trees=100,
max_depth=6,
learning_rate=0.1,
loss='mse',
)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
n_trees |
int | 100 | Number of boosting iterations |
max_depth |
int | 6 | Maximum depth of each tree |
learning_rate |
float | 0.1 | Step size shrinkage |
loss |
str/callable | 'mse' | Loss function |
min_child_weight |
float | 1.0 | Minimum sum of hessian in a leaf |
reg_lambda |
float | 1.0 | L2 regularization |
subsample |
float | 1.0 | Row subsampling ratio |
colsample_bytree |
float | 1.0 | Column subsampling ratio |
n_bins |
int | 256 | Number of histogram bins |
Loss Functions¶
| Loss | Use Case |
|---|---|
'mse' |
Regression (default) |
'mae' |
Robust regression |
'huber' |
Outlier-robust regression |
'logloss' |
Binary classification |
'quantile' |
Quantile regression |
| Custom callable | Your own loss |
Custom Loss Function¶
def quantile_loss(pred, y, tau=0.9):
residual = y - pred
grad = np.where(residual > 0, -tau, 1 - tau)
hess = np.ones_like(pred)
return grad, hess
model = ob.GradientBoosting(n_trees=100, loss=quantile_loss)
model.fit(X_train, y_train)
Training with Validation¶
model = ob.GradientBoosting(n_trees=500, max_depth=6)
model.fit(
X_train, y_train,
eval_set=[(X_val, y_val)],
callbacks=[
ob.EarlyStopping(patience=10),
ob.Logger(every=10),
],
)
Feature Importance¶
model.fit(X_train, y_train)
# Compute importance
importance = ob.compute_feature_importances(model.trees_)
# Plot
ob.plot_feature_importances(model.trees_, feature_names)
Growth Strategies¶
# Level-wise (XGBoost-style, default)
model = ob.GradientBoosting(growth='levelwise')
# Leaf-wise (LightGBM-style)
model = ob.GradientBoosting(growth='leafwise')
# Symmetric/Oblivious (CatBoost-style)
model = ob.GradientBoosting(growth='symmetric')
API Reference¶
GradientBoosting
dataclass
¶
GradientBoosting(
n_trees=100,
max_depth=6,
learning_rate=0.1,
loss="mse",
min_child_weight=1.0,
reg_lambda=1.0,
reg_alpha=0.0,
gamma=0.0,
subsample=1.0,
colsample_bytree=1.0,
n_bins=256,
quantile_alpha=0.5,
tweedie_rho=1.5,
distributed=False,
n_workers=None,
subsample_strategy="none",
goss_top_rate=0.2,
goss_other_rate=0.1,
batch_size=None,
n_gpus=None,
devices=None,
)
Bases: PersistenceMixin
Gradient Boosting ensemble model.
A gradient boosting model that supports both built-in loss functions and custom loss functions. When using built-in losses with GPU, training is fully batched for maximum performance.
| PARAMETER | DESCRIPTION |
|---|---|
n_trees
|
Number of trees to train.
TYPE:
|
max_depth
|
Maximum depth of each tree.
TYPE:
|
learning_rate
|
Shrinkage factor applied to each tree.
TYPE:
|
loss
|
Loss function. Can be: - 'mse': Mean Squared Error (regression) - 'logloss': Binary cross-entropy (classification) - 'huber': Huber loss (robust regression) - 'mae': Mean Absolute Error (L1 regression) - 'quantile': Quantile regression (use with quantile_alpha) - Callable: Custom function(pred, y) -> (grad, hess)
TYPE:
|
min_child_weight
|
Minimum sum of hessian in a leaf.
TYPE:
|
reg_lambda
|
L2 regularization on leaf values.
TYPE:
|
n_bins
|
Number of bins for histogram building.
TYPE:
|
quantile_alpha
|
Quantile level for 'quantile' loss (0 < alpha < 1). - 0.5: Median regression (default) - 0.9: 90th percentile - 0.1: 10th percentile
TYPE:
|
tweedie_rho
|
Variance power for 'tweedie' loss (1 < rho < 2). - 1.5: Default (compound Poisson-Gamma)
TYPE:
|
subsample_strategy
|
Sampling strategy for large-scale training (Phase 17): - 'none': No sampling (default) - 'random': Random subsampling - 'goss': Gradient-based One-Side Sampling (LightGBM-style)
TYPE:
|
goss_top_rate
|
Fraction of top-gradient samples to keep (for GOSS).
TYPE:
|
goss_other_rate
|
Fraction of remaining samples to sample (for GOSS).
TYPE:
|
batch_size
|
Mini-batch size for large datasets. If None, process all at once.
TYPE:
|
Examples:
Basic regression:
import openboost as ob
model = ob.GradientBoosting(n_trees=100, loss='mse')
model.fit(X_train, y_train)
predictions = model.predict(X_test)
Quantile regression (90th percentile):
GOSS for faster training:
model = ob.GradientBoosting(
n_trees=100,
subsample_strategy='goss',
goss_top_rate=0.2,
goss_other_rate=0.1,
)
Multi-GPU training:
fit
¶
Fit the gradient boosting model.
| PARAMETER | DESCRIPTION |
|---|---|
X
|
Training features, shape (n_samples, n_features).
TYPE:
|
y
|
Training targets, shape (n_samples,).
TYPE:
|
callbacks
|
List of Callback instances for training hooks. Use EarlyStopping for early stopping, Logger for progress.
TYPE:
|
eval_set
|
List of (X, y) tuples for validation (used with callbacks).
TYPE:
|
sample_weight
|
Sample weights, shape (n_samples,).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
self
|
The fitted model.
TYPE:
|
predict
¶
Generate predictions for X.
| PARAMETER | DESCRIPTION |
|---|---|
X
|
Features to predict on, shape (n_samples, n_features). Can be raw numpy array or pre-binned BinnedArray.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
predictions
|
Shape (n_samples,).
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If model is not fitted or X has wrong shape. |