Skip to content

Loss Functions

Built-in loss functions and utilities.

Regression Losses

mse_gradient

mse_gradient

mse_gradient(pred, y)

Compute MSE gradient and hessian.

Loss: L = 0.5 * (pred - y)^2 Gradient: dL/dpred = (pred - y) Hessian: d²L/dpred² = 1

Uses the 0.5 * MSE convention (matching XGBoost) so that reg_lambda has equivalent effect across libraries.

mae_gradient

mae_gradient

mae_gradient(pred, y)

Compute MAE (L1) gradient and hessian.

Loss: L = |pred - y| Gradient: sign(pred - y) Hessian: 0 (use small constant for GBDT stability)

Note: MAE is not twice-differentiable at pred=y, so we use a small constant hessian. This is the standard approach in XGBoost/LightGBM.

huber_gradient

huber_gradient

huber_gradient(pred, y, delta=1.0)

Compute Huber loss gradient and hessian.

L = 0.5 * (pred - y)^2 if |pred - y| <= delta

delta * |pred - y| - 0.5 * delta^2 otherwise

quantile_gradient

quantile_gradient

quantile_gradient(pred, y, alpha=0.5)

Compute Quantile (Pinball) loss gradient and hessian.

Loss: L = alpha * max(y - pred, 0) + (1 - alpha) * max(pred - y, 0)

This is the standard quantile regression loss: - alpha=0.5: Median regression (equivalent to MAE) - alpha=0.9: 90th percentile - alpha=0.1: 10th percentile

Gradient

alpha - 1 if pred > y (over-prediction) alpha if pred < y (under-prediction)

Hessian: Use constant (not twice-differentiable)

PARAMETER DESCRIPTION
pred

Predictions

TYPE: NDArray

y

Targets

TYPE: NDArray

alpha

Quantile level (0 < alpha < 1)

TYPE: float DEFAULT: 0.5

Classification Losses

logloss_gradient

logloss_gradient

logloss_gradient(pred, y)

Compute LogLoss gradient and hessian.

Loss: L = -ylog(p) - (1-y)log(1-p), where p = sigmoid(pred) Gradient: dL/dpred = p - y Hessian: d²L/dpred² = p * (1 - p)

softmax_gradient

softmax_gradient

softmax_gradient(pred, y, n_classes)

Compute Softmax cross-entropy gradient and hessian for multi-class.

This returns gradients for ALL classes at once. For GBDT, you typically train K trees per round (one per class).

PARAMETER DESCRIPTION
pred

Predictions, shape (n_samples, n_classes) - raw logits

TYPE: NDArray

y

Labels, shape (n_samples,) - integer class labels (0 to n_classes-1)

TYPE: NDArray

n_classes

Number of classes

TYPE: int

RETURNS DESCRIPTION
grad

Gradients, shape (n_samples, n_classes)

TYPE: NDArray

hess

Hessians, shape (n_samples, n_classes)

TYPE: NDArray

Note: For binary classification, use logloss instead (more efficient).

Count/Positive Data Losses

poisson_gradient

poisson_gradient

poisson_gradient(pred, y)

Compute Poisson deviance gradient and hessian.

For count data (clicks, purchases, etc.). Predictions are in log-space.

Loss: L = exp(pred) - y * pred (negative log-likelihood) Gradient: dL/dpred = exp(pred) - y Hessian: d²L/dpred² = exp(pred)

Note: y must be non-negative integers (counts).

gamma_gradient

gamma_gradient

gamma_gradient(pred, y)

Compute Gamma deviance gradient and hessian.

For positive continuous data (insurance claims, etc.). Predictions are in log-space.

Loss: L = y * exp(-pred) + pred (negative log-likelihood, ignoring constants) Gradient: dL/dpred = 1 - y * exp(-pred) Hessian: d²L/dpred² = y * exp(-pred)

Note: y must be strictly positive.

tweedie_gradient

tweedie_gradient

tweedie_gradient(pred, y, rho=1.5)

Compute Tweedie deviance gradient and hessian.

Tweedie distribution interpolates between Poisson (rho=1) and Gamma (rho=2). Commonly used for insurance claims with many zeros.

For rho in (1, 2), predictions are in log-space: Loss: L = -y * exp(pred * (1-rho)) / (1-rho) + exp(pred * (2-rho)) / (2-rho)

PARAMETER DESCRIPTION
pred

Predictions (in log-space)

TYPE: NDArray

y

Targets (non-negative, can have zeros)

TYPE: NDArray

rho

Variance power (1 < rho < 2 for compound Poisson-Gamma)

TYPE: float DEFAULT: 1.5

Note: rho=1.5 is a common default for insurance data.

Utilities

get_loss_function

get_loss_function

get_loss_function(loss, **kwargs)

Get a loss function by name or return custom callable.

PARAMETER DESCRIPTION
loss

Loss function name or callable. Available: - 'mse': Mean Squared Error (regression) - 'mae': Mean Absolute Error (L1 regression) - 'huber': Huber loss (robust regression) - 'logloss': Binary cross-entropy (classification) - 'quantile': Quantile regression (percentile prediction) - 'poisson': Poisson deviance (count data) - 'gamma': Gamma deviance (positive continuous) - 'tweedie': Tweedie deviance (compound Poisson-Gamma)

TYPE: str | LossFunction

**kwargs

Additional parameters for specific losses: - quantile_alpha: Quantile level for 'quantile' loss (default 0.5) - tweedie_rho: Variance power for 'tweedie' loss (default 1.5)

DEFAULT: {}

RETURNS DESCRIPTION
LossFunction

Loss function callable.

Examples:

>>> loss_fn = get_loss_function('mse')
>>> loss_fn = get_loss_function('quantile', quantile_alpha=0.9)
>>> loss_fn = get_loss_function('tweedie', tweedie_rho=1.5)