Skip to content

Gradient Boosting

The core gradient boosting model for regression and binary classification.

Basic Usage

import openboost as ob

model = ob.GradientBoosting(
    n_trees=100,
    max_depth=6,
    learning_rate=0.1,
    loss='mse',
)
model.fit(X_train, y_train)
predictions = model.predict(X_test)

Parameters

Parameter Type Default Description
n_trees int 100 Number of boosting iterations
max_depth int 6 Maximum depth of each tree
learning_rate float 0.1 Step size shrinkage
loss str/callable 'mse' Loss function
min_child_weight float 1.0 Minimum sum of hessian in a leaf
reg_lambda float 1.0 L2 regularization
subsample float 1.0 Row subsampling ratio
colsample_bytree float 1.0 Column subsampling ratio
n_bins int 256 Number of histogram bins

Loss Functions

Loss Use Case
'mse' Regression (default)
'mae' Robust regression
'huber' Outlier-robust regression
'logloss' Binary classification
'quantile' Quantile regression
Custom callable Your own loss

Custom Loss Function

def quantile_loss(pred, y, tau=0.9):
    residual = y - pred
    grad = np.where(residual > 0, -tau, 1 - tau)
    hess = np.ones_like(pred)
    return grad, hess

model = ob.GradientBoosting(n_trees=100, loss=quantile_loss)
model.fit(X_train, y_train)

Training with Validation

model = ob.GradientBoosting(n_trees=500, max_depth=6)

model.fit(
    X_train, y_train,
    eval_set=[(X_val, y_val)],
    callbacks=[
        ob.EarlyStopping(patience=10),
        ob.Logger(every=10),
    ],
)

Feature Importance

model.fit(X_train, y_train)

# Compute importance
importance = ob.compute_feature_importances(model.trees_)

# Plot
ob.plot_feature_importances(model.trees_, feature_names)

Growth Strategies

# Level-wise (XGBoost-style, default)
model = ob.GradientBoosting(growth='levelwise')

# Leaf-wise (LightGBM-style)
model = ob.GradientBoosting(growth='leafwise')

# Symmetric/Oblivious (CatBoost-style)
model = ob.GradientBoosting(growth='symmetric')

API Reference

GradientBoosting dataclass

GradientBoosting(
    n_trees=100,
    max_depth=6,
    learning_rate=0.1,
    loss="mse",
    min_child_weight=1.0,
    reg_lambda=1.0,
    reg_alpha=0.0,
    gamma=0.0,
    subsample=1.0,
    colsample_bytree=1.0,
    n_bins=256,
    quantile_alpha=0.5,
    tweedie_rho=1.5,
    distributed=False,
    n_workers=None,
    subsample_strategy="none",
    goss_top_rate=0.2,
    goss_other_rate=0.1,
    batch_size=None,
    n_gpus=None,
    devices=None,
)

Bases: PersistenceMixin

Gradient Boosting ensemble model.

A gradient boosting model that supports both built-in loss functions and custom loss functions. When using built-in losses with GPU, training is fully batched for maximum performance.

PARAMETER DESCRIPTION
n_trees

Number of trees to train.

TYPE: int DEFAULT: 100

max_depth

Maximum depth of each tree.

TYPE: int DEFAULT: 6

learning_rate

Shrinkage factor applied to each tree.

TYPE: float DEFAULT: 0.1

loss

Loss function. Can be: - 'mse': Mean Squared Error (regression) - 'logloss': Binary cross-entropy (classification) - 'huber': Huber loss (robust regression) - 'mae': Mean Absolute Error (L1 regression) - 'quantile': Quantile regression (use with quantile_alpha) - Callable: Custom function(pred, y) -> (grad, hess)

TYPE: str | LossFunction | Callable[..., tuple] DEFAULT: 'mse'

min_child_weight

Minimum sum of hessian in a leaf.

TYPE: float DEFAULT: 1.0

reg_lambda

L2 regularization on leaf values.

TYPE: float DEFAULT: 1.0

n_bins

Number of bins for histogram building.

TYPE: int DEFAULT: 256

quantile_alpha

Quantile level for 'quantile' loss (0 < alpha < 1). - 0.5: Median regression (default) - 0.9: 90th percentile - 0.1: 10th percentile

TYPE: float DEFAULT: 0.5

tweedie_rho

Variance power for 'tweedie' loss (1 < rho < 2). - 1.5: Default (compound Poisson-Gamma)

TYPE: float DEFAULT: 1.5

subsample_strategy

Sampling strategy for large-scale training (Phase 17): - 'none': No sampling (default) - 'random': Random subsampling - 'goss': Gradient-based One-Side Sampling (LightGBM-style)

TYPE: Literal['none', 'random', 'goss'] DEFAULT: 'none'

goss_top_rate

Fraction of top-gradient samples to keep (for GOSS).

TYPE: float DEFAULT: 0.2

goss_other_rate

Fraction of remaining samples to sample (for GOSS).

TYPE: float DEFAULT: 0.1

batch_size

Mini-batch size for large datasets. If None, process all at once.

TYPE: int | None DEFAULT: None

Examples:

Basic regression:

import openboost as ob

model = ob.GradientBoosting(n_trees=100, loss='mse')
model.fit(X_train, y_train)
predictions = model.predict(X_test)

Quantile regression (90th percentile):

model = ob.GradientBoosting(loss='quantile', quantile_alpha=0.9)
model.fit(X_train, y_train)

GOSS for faster training:

model = ob.GradientBoosting(
    n_trees=100,
    subsample_strategy='goss',
    goss_top_rate=0.2,
    goss_other_rate=0.1,
)

Multi-GPU training:

model = ob.GradientBoosting(n_trees=100, n_gpus=4)
model.fit(X, y)  # Data parallel across 4 GPUs

fit

fit(
    X, y, callbacks=None, eval_set=None, sample_weight=None
)

Fit the gradient boosting model.

PARAMETER DESCRIPTION
X

Training features, shape (n_samples, n_features).

TYPE: NDArray

y

Training targets, shape (n_samples,).

TYPE: NDArray

callbacks

List of Callback instances for training hooks. Use EarlyStopping for early stopping, Logger for progress.

TYPE: list[Callback] | None DEFAULT: None

eval_set

List of (X, y) tuples for validation (used with callbacks).

TYPE: list[tuple[NDArray, NDArray]] | None DEFAULT: None

sample_weight

Sample weights, shape (n_samples,).

TYPE: NDArray | None DEFAULT: None

RETURNS DESCRIPTION
self

The fitted model.

TYPE: GradientBoosting

Example
from openboost import GradientBoosting, EarlyStopping, Logger

model = GradientBoosting(n_trees=1000)
model.fit(
    X, y,
    callbacks=[EarlyStopping(patience=50), Logger(period=10)],
    eval_set=[(X_val, y_val)]
)

predict

predict(X)

Generate predictions for X.

PARAMETER DESCRIPTION
X

Features to predict on, shape (n_samples, n_features). Can be raw numpy array or pre-binned BinnedArray.

TYPE: NDArray | BinnedArray

RETURNS DESCRIPTION
predictions

Shape (n_samples,).

TYPE: NDArray

RAISES DESCRIPTION
ValueError

If model is not fitted or X has wrong shape.