Distributions¶
Probability distributions for NaturalBoost.
Built-in Distributions¶
Normal¶
Normal
¶
Bases: Distribution
Normal (Gaussian) distribution.
| PARAMETER | DESCRIPTION |
|---|---|
loc
|
Mean, unbounded
TYPE:
|
scale
|
Standard deviation, must be positive
TYPE:
|
Link functions
loc: identity (unbounded) scale: exp (ensures σ > 0)
PDF: p(y) = (1/√(2πσ²)) exp(-(y-μ)²/(2σ²)) NLL: 0.5 * log(2πσ²) + (y-μ)²/(2σ²)
nll_gradient
¶
Compute gradients of NLL w.r.t. raw parameters.
NLL = 0.5 * log(2πσ²) + (y - μ)² / (2σ²)
For loc (identity link): d(NLL)/dμ = -(y - μ) / σ² d²(NLL)/dμ² = 1 / σ²
For scale with exp link (σ = exp(s)): d(NLL)/ds = 1 - (y - μ)² / σ² d²(NLL)/ds² ≈ 2 (expected hessian at optimum)
fisher_information
¶
Fisher information matrix for Normal distribution.
For Normal with exp link on scale: F = [[1/σ², 0 ], [0, 2 ]]
The off-diagonal is 0 because mean and variance are orthogonal parameters in the normal family.
LogNormal¶
LogNormal
¶
Bases: Distribution
Log-Normal distribution for positive continuous data.
If X ~ LogNormal(μ, σ), then log(X) ~ Normal(μ, σ).
| PARAMETER | DESCRIPTION |
|---|---|
loc
|
Mean of underlying normal
TYPE:
|
scale
|
Std of underlying normal (must be positive)
TYPE:
|
Link functions
loc: identity scale: exp
Mean: exp(μ + σ²/2) Variance: (exp(σ²) - 1) * exp(2μ + σ²)
Gamma¶
Gamma
¶
Bases: Distribution
Gamma distribution for positive continuous data.
Parameterization: shape (α) and rate (β) - Mean = α/β - Variance = α/β²
| PARAMETER | DESCRIPTION |
|---|---|
concentration
|
Shape parameter, must be positive
TYPE:
|
rate
|
Rate parameter, must be positive
TYPE:
|
Link functions: exp for both (ensure positivity)
Alternative: Can also be parameterized by mean and dispersion.
init_params
¶
Initialize using method of moments.
mean = α/β, var = α/β² => β = mean/var, α = mean * β = mean²/var
nll_gradient
¶
Gradients for Gamma distribution.
NLL = -αlog(β) + log(Γ(α)) - (α-1)log(y) + β*y
d(NLL)/dα = -log(β) + ψ(α) - log(y) d(NLL)/dβ = -α/β + y
With exp links (α = exp(a), β = exp(b)): d(NLL)/da = α * (-log(β) + ψ(α) - log(y)) d(NLL)/db = β * (-α/β + y) = -α + β*y
Poisson¶
Poisson
¶
Bases: Distribution
Poisson distribution for count data.
Single parameter: rate (λ) - Mean = λ - Variance = λ
Link function: exp (ensures λ > 0)
StudentT¶
StudentT
¶
Bases: Distribution
Student-t distribution for heavy-tailed data.
| PARAMETER | DESCRIPTION |
|---|---|
loc
|
Location parameter
TYPE:
|
scale
|
Scale parameter (positive)
TYPE:
|
df
|
Degrees of freedom (positive, typically > 2)
TYPE:
|
For ν → ∞, approaches Normal distribution. Lower ν = heavier tails.
Link functions
loc: identity scale: exp df: softplus (ensures > 0, typically > 2)
Tweedie¶
Tweedie
¶
Bases: Distribution
Tweedie distribution for zero-inflated positive continuous data.
Key use case: Insurance claims, revenue forecasting with zeros.
Popular in Kaggle competitions: - Porto Seguro Safe Driver Prediction - Allstate Claims Severity - Any competition with zero-inflated positive targets
The Tweedie distribution is a compound Poisson-Gamma: - ρ = 1: Poisson (count data) - 1 < ρ < 2: Compound Poisson-Gamma (zeros + positive continuous) - ρ = 2: Gamma (positive continuous)
| PARAMETER | DESCRIPTION |
|---|---|
mu
|
Mean parameter (positive)
TYPE:
|
phi
|
Dispersion parameter (positive)
TYPE:
|
Why better than XGBoost? - XGBoost Tweedie only outputs point estimates - NGBoost Tweedie outputs full distribution → prediction intervals, uncertainty quantification, probabilistic forecasts
Link functions
mu: log (ensures μ > 0) phi: log (ensures φ > 0)
Initialize Tweedie with power parameter.
| PARAMETER | DESCRIPTION |
|---|---|
power
|
Variance power (1 < power < 2 for compound Poisson-Gamma) 1.5 is the default used in most Kaggle competitions.
TYPE:
|
NegativeBinomial¶
NegativeBinomial
¶
Bases: Distribution
Negative Binomial distribution for overdispersed count data.
Key use case: Sales forecasting, demand prediction, click counts.
Popular in Kaggle competitions: - Rossmann Store Sales - Bike Sharing Demand - Grupo Bimbo Inventory Demand - Any competition with count data where variance > mean
Compared to Poisson: - Poisson: Var(Y) = Mean(Y) - NegBin: Var(Y) = Mean(Y) + Mean(Y)²/r (overdispersion)
| PARAMETER | DESCRIPTION |
|---|---|
mu
|
Mean parameter (positive)
TYPE:
|
r
|
Dispersion parameter (positive, smaller = more overdispersion)
|
Why better than XGBoost? - XGBoost can't output count distributions at all - NGBoost NegBin outputs full distribution → prediction intervals, probability of exceeding thresholds, demand planning
Link functions
mu: log (ensures μ > 0) r: log (ensures r > 0)
init_params
¶
Initialize using method of moments.
Mean = μ Var = μ + μ²/r => r = μ² / (Var - μ)
nll_gradient
¶
Gradients for Negative Binomial.
NLL = -log Γ(y+r) + log Γ(r) + log Γ(y+1) - rlog(r/(r+μ)) - ylog(μ/(r+μ))
prob_exceed
¶
Probability that Y > threshold.
Very useful for demand planning: "What's the probability we need more than 100 units?"
Custom Distributions¶
create_custom_distribution¶
create_custom_distribution
¶
Convenience function to create a custom distribution.
Example: Model y ~ Normal(A * exp(-B*x_feature), sigma)
>>> dist = create_custom_distribution(
... param_names=['A', 'B', 'sigma'],
... link_functions={'A': 'exp', 'B': 'softplus', 'sigma': 'exp'},
... nll_fn=lambda y, p: 0.5*np.log(2*np.pi*p['sigma']**2) + (y-p['A'])**2/(2*p['sigma']**2),
... mean_fn=lambda p: p['A'],
... variance_fn=lambda p: p['sigma']**2,
... )
CustomDistribution¶
CustomDistribution
¶
CustomDistribution(
param_names,
link_functions,
nll_fn,
mean_fn=None,
variance_fn=None,
init_fn=None,
use_jax=True,
eps=1e-05,
)
Bases: Distribution
User-defined distribution with automatic gradient computation.
Define any parametric distribution by specifying: 1. Parameter names and link functions 2. Negative log-likelihood function
Gradients are computed automatically via: - JAX (if available) - fastest - Numerical differentiation (fallback)
Example: Custom "ratio" distribution y ~ Normal(A*(1-B)/C, σ)
>>> def my_nll(y, params):
... A, B, C, sigma = params['A'], params['B'], params['C'], params['sigma']
... mu = A * (1 - B) / C
... return 0.5 * np.log(2 * np.pi * sigma**2) + (y - mu)**2 / (2 * sigma**2)
>>>
>>> dist = CustomDistribution(
... param_names=['A', 'B', 'C', 'sigma'],
... link_functions={
... 'A': 'identity', # A ∈ (-∞, ∞)
... 'B': 'sigmoid', # B ∈ (0, 1)
... 'C': 'softplus', # C > 0
... 'sigma': 'exp', # σ > 0
... },
... nll_fn=my_nll,
... mean_fn=lambda params: params['A'] * (1 - params['B']) / params['C'],
... )
>>>
>>> model = NGBoost(distribution=dist, n_trees=100)
>>> model.fit(X, y)
For Kaggle competitions with custom evaluation metrics, you can define the NLL to match the competition metric!
Initialize custom distribution.
| PARAMETER | DESCRIPTION |
|---|---|
param_names
|
List of parameter names (e.g., ['A', 'B', 'sigma'])
TYPE:
|
link_functions
|
Dict mapping param name to link type: - 'identity': no transformation, param ∈ (-∞, ∞) - 'exp': exponential, param > 0 - 'softplus': log(1 + exp(x)), param > 0 (smoother than exp) - 'sigmoid': 1/(1+exp(-x)), param ∈ (0, 1) - 'square': x², param ≥ 0
TYPE:
|
nll_fn
|
Function (y, params_dict) -> array of NLL per sample
TYPE:
|
mean_fn
|
Optional function (params_dict) -> mean prediction
TYPE:
|
variance_fn
|
Optional function (params_dict) -> variance
TYPE:
|
init_fn
|
Optional function (y) -> dict of initial raw param values
TYPE:
|
use_jax
|
Try to use JAX for autodiff (falls back to numerical if unavailable)
TYPE:
|
eps
|
Epsilon for numerical gradients
TYPE:
|
Utilities¶
get_distribution¶
get_distribution
¶
Get distribution by name or return instance.
| PARAMETER | DESCRIPTION |
|---|---|
name
|
Distribution name or Distribution instance
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Distribution
|
Distribution instance |
Example
dist = get_distribution('normal') dist = get_distribution('gamma')
list_distributions¶
Base Classes¶
Distribution¶
Distribution
¶
Bases: ABC
Base class for probability distributions.
Subclasses must implement: - n_params: Number of distributional parameters - param_names: Names of parameters - link: Transform raw -> constrained parameter space - link_inv: Transform constrained -> raw - nll_gradient: Gradient and hessian of NLL w.r.t. raw parameters - fisher_information: Fisher information matrix (for NGBoost)
link
abstractmethod
¶
Apply link function: raw -> constrained parameter space.
E.g., for scale: exp(raw) to ensure positivity.
| PARAMETER | DESCRIPTION |
|---|---|
param_name
|
Name of the parameter
TYPE:
|
raw
|
Raw (unbounded) values
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
NDArray
|
Constrained parameter values |
link_inv
abstractmethod
¶
Inverse link: constrained -> raw (for initialization).
| PARAMETER | DESCRIPTION |
|---|---|
param_name
|
Name of the parameter
TYPE:
|
param
|
Constrained parameter values
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
NDArray
|
Raw (unbounded) values |
nll_gradient
abstractmethod
¶
Compute gradient and hessian of NLL w.r.t. each RAW parameter.
The gradient is d(NLL)/d(raw), accounting for the link function.
| PARAMETER | DESCRIPTION |
|---|---|
y
|
Observed target values
TYPE:
|
params
|
Dictionary of constrained parameter values
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, GradHess]
|
Dictionary mapping param_name -> (gradient, hessian) |
fisher_information
abstractmethod
¶
Fisher information matrix at given parameters.
Shape: (n_samples, n_params, n_params) Used for natural gradient computation in NGBoost.
| PARAMETER | DESCRIPTION |
|---|---|
params
|
Dictionary of constrained parameter values
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
NDArray
|
Fisher information matrix |
natural_gradient
¶
Compute natural gradient: F^{-1} @ ordinary_gradient.
Natural gradient accounts for the geometry of the parameter space, leading to faster convergence. This is the key insight of NGBoost.
| PARAMETER | DESCRIPTION |
|---|---|
y
|
Observed target values
TYPE:
|
params
|
Dictionary of constrained parameter values
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, GradHess]
|
Dictionary mapping param_name -> (natural_gradient, hessian) |
init_params
¶
Initialize parameters from target values.
Returns raw (pre-link) initial values for each parameter.
| PARAMETER | DESCRIPTION |
|---|---|
y
|
Target values for initialization
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, float]
|
Dictionary mapping param_name -> initial raw value |
DistributionOutput¶
DistributionOutput
dataclass
¶
Container for distribution parameter predictions.
| ATTRIBUTE | DESCRIPTION |
|---|---|
params |
Dictionary mapping parameter names to predicted values
TYPE:
|
distribution |
The Distribution instance used
TYPE:
|
interval
¶
(1-alpha) prediction interval.
| PARAMETER | DESCRIPTION |
|---|---|
alpha
|
Significance level (0.1 = 90% interval)
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple[NDArray, NDArray]
|
(lower, upper) bounds |
sample
¶
Draw samples from the predicted distribution.
| PARAMETER | DESCRIPTION |
|---|---|
n_samples
|
Number of samples per observation
TYPE:
|
seed
|
Random seed for reproducibility
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
samples
|
Shape (n_observations, n_samples)
TYPE:
|
nll
¶
Negative log-likelihood for observed values.
| PARAMETER | DESCRIPTION |
|---|---|
y
|
Observed values
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
nll
|
Per-sample negative log-likelihood
TYPE:
|