Loss Functions¶
Built-in loss functions and utilities.
Regression Losses¶
mse_gradient¶
mse_gradient
¶
Compute MSE gradient and hessian.
Loss: L = 0.5 * (pred - y)^2 Gradient: dL/dpred = (pred - y) Hessian: d²L/dpred² = 1
Uses the 0.5 * MSE convention (matching XGBoost) so that reg_lambda has equivalent effect across libraries.
mae_gradient¶
mae_gradient
¶
Compute MAE (L1) gradient and hessian.
Loss: L = |pred - y| Gradient: sign(pred - y) Hessian: 0 (use small constant for GBDT stability)
Note: MAE is not twice-differentiable at pred=y, so we use a small constant hessian. This is the standard approach in XGBoost/LightGBM.
huber_gradient¶
huber_gradient
¶
Compute Huber loss gradient and hessian.
L = 0.5 * (pred - y)^2 if |pred - y| <= delta
delta * |pred - y| - 0.5 * delta^2 otherwise
quantile_gradient¶
quantile_gradient
¶
Compute Quantile (Pinball) loss gradient and hessian.
Loss: L = alpha * max(y - pred, 0) + (1 - alpha) * max(pred - y, 0)
This is the standard quantile regression loss: - alpha=0.5: Median regression (equivalent to MAE) - alpha=0.9: 90th percentile - alpha=0.1: 10th percentile
Gradient
alpha - 1 if pred > y (over-prediction) alpha if pred < y (under-prediction)
Hessian: Use constant (not twice-differentiable)
| PARAMETER | DESCRIPTION |
|---|---|
pred
|
Predictions
TYPE:
|
y
|
Targets
TYPE:
|
alpha
|
Quantile level (0 < alpha < 1)
TYPE:
|
Classification Losses¶
logloss_gradient¶
logloss_gradient
¶
Compute LogLoss gradient and hessian.
Loss: L = -ylog(p) - (1-y)log(1-p), where p = sigmoid(pred) Gradient: dL/dpred = p - y Hessian: d²L/dpred² = p * (1 - p)
softmax_gradient¶
softmax_gradient
¶
Compute Softmax cross-entropy gradient and hessian for multi-class.
This returns gradients for ALL classes at once. For GBDT, you typically train K trees per round (one per class).
| PARAMETER | DESCRIPTION |
|---|---|
pred
|
Predictions, shape (n_samples, n_classes) - raw logits
TYPE:
|
y
|
Labels, shape (n_samples,) - integer class labels (0 to n_classes-1)
TYPE:
|
n_classes
|
Number of classes
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
grad
|
Gradients, shape (n_samples, n_classes)
TYPE:
|
hess
|
Hessians, shape (n_samples, n_classes)
TYPE:
|
Note: For binary classification, use logloss instead (more efficient).
Count/Positive Data Losses¶
poisson_gradient¶
poisson_gradient
¶
Compute Poisson deviance gradient and hessian.
For count data (clicks, purchases, etc.). Predictions are in log-space.
Loss: L = exp(pred) - y * pred (negative log-likelihood) Gradient: dL/dpred = exp(pred) - y Hessian: d²L/dpred² = exp(pred)
Note: y must be non-negative integers (counts).
gamma_gradient¶
gamma_gradient
¶
Compute Gamma deviance gradient and hessian.
For positive continuous data (insurance claims, etc.). Predictions are in log-space.
Loss: L = y * exp(-pred) + pred (negative log-likelihood, ignoring constants) Gradient: dL/dpred = 1 - y * exp(-pred) Hessian: d²L/dpred² = y * exp(-pred)
Note: y must be strictly positive.
tweedie_gradient¶
tweedie_gradient
¶
Compute Tweedie deviance gradient and hessian.
Tweedie distribution interpolates between Poisson (rho=1) and Gamma (rho=2). Commonly used for insurance claims with many zeros.
For rho in (1, 2), predictions are in log-space: Loss: L = -y * exp(pred * (1-rho)) / (1-rho) + exp(pred * (2-rho)) / (2-rho)
| PARAMETER | DESCRIPTION |
|---|---|
pred
|
Predictions (in log-space)
TYPE:
|
y
|
Targets (non-negative, can have zeros)
TYPE:
|
rho
|
Variance power (1 < rho < 2 for compound Poisson-Gamma)
TYPE:
|
Note: rho=1.5 is a common default for insurance data.
Utilities¶
get_loss_function¶
get_loss_function
¶
Get a loss function by name or return custom callable.
| PARAMETER | DESCRIPTION |
|---|---|
loss
|
Loss function name or callable. Available: - 'mse': Mean Squared Error (regression) - 'mae': Mean Absolute Error (L1 regression) - 'huber': Huber loss (robust regression) - 'logloss': Binary cross-entropy (classification) - 'quantile': Quantile regression (percentile prediction) - 'poisson': Poisson deviance (count data) - 'gamma': Gamma deviance (positive continuous) - 'tweedie': Tweedie deviance (compound Poisson-Gamma)
TYPE:
|
**kwargs
|
Additional parameters for specific losses: - quantile_alpha: Quantile level for 'quantile' loss (default 0.5) - tweedie_rho: Variance power for 'tweedie' loss (default 1.5)
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
LossFunction
|
Loss function callable. |
Examples: