Loss Functions¶

Built-in loss functions and utilities.

Regression Losses¶

mse_gradient¶

mse_gradient ¶

mse_gradient(pred, y)

Compute MSE gradient and hessian.

Loss: L = 0.5 * (pred - y)^2 Gradient: dL/dpred = (pred - y) Hessian: d²L/dpred² = 1

Uses the 0.5 * MSE convention (matching XGBoost) so that reg_lambda has equivalent effect across libraries.

mae_gradient¶

mae_gradient ¶

mae_gradient(pred, y)

Compute MAE (L1) gradient and hessian.

Loss: L = |pred - y| Gradient: sign(pred - y) Hessian: 0 (use small constant for GBDT stability)

Note: MAE is not twice-differentiable at pred=y, so we use a small constant hessian. This is the standard approach in XGBoost/LightGBM.

huber_gradient¶

huber_gradient ¶

huber_gradient(pred, y, delta=1.0)

Compute Huber loss gradient and hessian.

L = 0.5 * (pred - y)^2 if |pred - y| <= delta

delta * |pred - y| - 0.5 * delta^2 otherwise

quantile_gradient¶

quantile_gradient ¶

quantile_gradient(pred, y, alpha=0.5)

Compute Quantile (Pinball) loss gradient and hessian.

Loss: L = alpha * max(y - pred, 0) + (1 - alpha) * max(pred - y, 0)

This is the standard quantile regression loss: - alpha=0.5: Median regression (equivalent to MAE) - alpha=0.9: 90th percentile - alpha=0.1: 10th percentile

Gradient

alpha - 1 if pred > y (over-prediction) alpha if pred < y (under-prediction)

Hessian: Use constant (not twice-differentiable)

PARAMETER	DESCRIPTION
`pred`	Predictions TYPE: `NDArray`
`y`	Targets TYPE: `NDArray`
`alpha`	Quantile level (0 < alpha < 1) TYPE: `float` DEFAULT: `0.5`

Classification Losses¶

logloss_gradient¶

logloss_gradient ¶

logloss_gradient(pred, y)

Compute LogLoss gradient and hessian.

Loss: L = -ylog(p) - (1-y)log(1-p), where p = sigmoid(pred) Gradient: dL/dpred = p - y Hessian: d²L/dpred² = p * (1 - p)

softmax_gradient¶

softmax_gradient ¶

softmax_gradient(pred, y, n_classes)

Compute Softmax cross-entropy gradient and hessian for multi-class.

This returns gradients for ALL classes at once. For GBDT, you typically train K trees per round (one per class).

PARAMETER	DESCRIPTION
`pred`	Predictions, shape (n_samples, n_classes) - raw logits TYPE: `NDArray`
`y`	Labels, shape (n_samples,) - integer class labels (0 to n_classes-1) TYPE: `NDArray`
`n_classes`	Number of classes TYPE: `int`

RETURNS	DESCRIPTION
`grad`	Gradients, shape (n_samples, n_classes) TYPE: `NDArray`
`hess`	Hessians, shape (n_samples, n_classes) TYPE: `NDArray`

Note: For binary classification, use logloss instead (more efficient).

Count/Positive Data Losses¶

poisson_gradient¶

poisson_gradient ¶

poisson_gradient(pred, y)

Compute Poisson deviance gradient and hessian.

For count data (clicks, purchases, etc.). Predictions are in log-space.

Loss: L = exp(pred) - y * pred (negative log-likelihood) Gradient: dL/dpred = exp(pred) - y Hessian: d²L/dpred² = exp(pred)

Note: y must be non-negative integers (counts).

gamma_gradient¶

gamma_gradient ¶

gamma_gradient(pred, y)

Compute Gamma deviance gradient and hessian.

For positive continuous data (insurance claims, etc.). Predictions are in log-space.

Loss: L = y * exp(-pred) + pred (negative log-likelihood, ignoring constants) Gradient: dL/dpred = 1 - y * exp(-pred) Hessian: d²L/dpred² = y * exp(-pred)

Note: y must be strictly positive.

tweedie_gradient¶

tweedie_gradient ¶

tweedie_gradient(pred, y, rho=1.5)

Compute Tweedie deviance gradient and hessian.

Tweedie distribution interpolates between Poisson (rho=1) and Gamma (rho=2). Commonly used for insurance claims with many zeros.

For rho in (1, 2), predictions are in log-space: Loss: L = -y * exp(pred * (1-rho)) / (1-rho) + exp(pred * (2-rho)) / (2-rho)

PARAMETER	DESCRIPTION
`pred`	Predictions (in log-space) TYPE: `NDArray`
`y`	Targets (non-negative, can have zeros) TYPE: `NDArray`
`rho`	Variance power (1 < rho < 2 for compound Poisson-Gamma) TYPE: `float` DEFAULT: `1.5`

Note: rho=1.5 is a common default for insurance data.

Utilities¶

get_loss_function¶

get_loss_function ¶

get_loss_function(loss, **kwargs)

Get a loss function by name or return custom callable.

PARAMETER DESCRIPTION

loss

Loss function name or callable. Available: - 'mse': Mean Squared Error (regression) - 'mae': Mean Absolute Error (L1 regression) - 'huber': Huber loss (robust regression) - 'logloss': Binary cross-entropy (classification) - 'quantile': Quantile regression (percentile prediction) - 'poisson': Poisson deviance (count data) - 'gamma': Gamma deviance (positive continuous) - 'tweedie': Tweedie deviance (compound Poisson-Gamma)

TYPE: str | LossFunction

**kwargs

Additional parameters for specific losses: - quantile_alpha: Quantile level for 'quantile' loss (default 0.5) - tweedie_rho: Variance power for 'tweedie' loss (default 1.5)

DEFAULT: {}

RETURNS	DESCRIPTION
`LossFunction`	Loss function callable.

Examples:

>>> loss_fn = get_loss_function('mse')
>>> loss_fn = get_loss_function('quantile', quantile_alpha=0.9)
>>> loss_fn = get_loss_function('tweedie', tweedie_rho=1.5)