Skip to content

Distributions

Probability distributions for NaturalBoost.

Built-in Distributions

Normal

Normal

Bases: Distribution

Normal (Gaussian) distribution.

PARAMETER DESCRIPTION
loc

Mean, unbounded

TYPE: μ

scale

Standard deviation, must be positive

TYPE: σ

PDF: p(y) = (1/√(2πσ²)) exp(-(y-μ)²/(2σ²)) NLL: 0.5 * log(2πσ²) + (y-μ)²/(2σ²)

init_params

init_params(y)

Initialize with sample mean and std.

nll_gradient

nll_gradient(y, params)

Compute gradients of NLL w.r.t. raw parameters.

NLL = 0.5 * log(2πσ²) + (y - μ)² / (2σ²)

For loc (identity link): d(NLL)/dμ = -(y - μ) / σ² d²(NLL)/dμ² = 1 / σ²

For scale with exp link (σ = exp(s)): d(NLL)/ds = 1 - (y - μ)² / σ² d²(NLL)/ds² ≈ 2 (expected hessian at optimum)

fisher_information

fisher_information(params)

Fisher information matrix for Normal distribution.

For Normal with exp link on scale: F = [[1/σ², 0 ], [0, 2 ]]

The off-diagonal is 0 because mean and variance are orthogonal parameters in the normal family.

quantile

quantile(params, q)

Quantile of Normal distribution.

sample

sample(params, n_samples=1, seed=None)

Sample from Normal distribution.

nll

nll(y, params)

Negative log-likelihood.

LogNormal

LogNormal

Bases: Distribution

Log-Normal distribution for positive continuous data.

If X ~ LogNormal(μ, σ), then log(X) ~ Normal(μ, σ).

PARAMETER DESCRIPTION
loc

Mean of underlying normal

TYPE: μ

scale

Std of underlying normal (must be positive)

TYPE: σ

Mean: exp(μ + σ²/2) Variance: (exp(σ²) - 1) * exp(2μ + σ²)

init_params

init_params(y)

Initialize from positive target values.

nll_gradient

nll_gradient(y, params)

Gradients for LogNormal.

NLL = log(y) + 0.5*log(2πσ²) + (log(y) - μ)²/(2σ²)

Same gradients as Normal but with log(y) as target.

fisher_information

fisher_information(params)

Same as Normal (parameters are for underlying normal).

Gamma

Gamma

Bases: Distribution

Gamma distribution for positive continuous data.

Parameterization: shape (α) and rate (β) - Mean = α/β - Variance = α/β²

PARAMETER DESCRIPTION
concentration

Shape parameter, must be positive

TYPE: α

rate

Rate parameter, must be positive

TYPE: β

Link functions: exp for both (ensure positivity)

Alternative: Can also be parameterized by mean and dispersion.

init_params

init_params(y)

Initialize using method of moments.

mean = α/β, var = α/β² => β = mean/var, α = mean * β = mean²/var

nll_gradient

nll_gradient(y, params)

Gradients for Gamma distribution.

NLL = -αlog(β) + log(Γ(α)) - (α-1)log(y) + β*y

d(NLL)/dα = -log(β) + ψ(α) - log(y) d(NLL)/dβ = -α/β + y

With exp links (α = exp(a), β = exp(b)): d(NLL)/da = α * (-log(β) + ψ(α) - log(y)) d(NLL)/db = β * (-α/β + y) = -α + β*y

fisher_information

fisher_information(params)

Fisher information for Gamma (with exp links).

Poisson

Poisson

Bases: Distribution

Poisson distribution for count data.

Single parameter: rate (λ) - Mean = λ - Variance = λ

Link function: exp (ensures λ > 0)

nll_gradient

nll_gradient(y, params)

Gradients for Poisson.

NLL = λ - y*log(λ) + log(y!) d(NLL)/dλ = 1 - y/λ

With exp link (λ = exp(l)): d(NLL)/dl = λ - y d²(NLL)/dl² = λ

fisher_information

fisher_information(params)

Fisher information for Poisson: F = λ.

StudentT

StudentT

Bases: Distribution

Student-t distribution for heavy-tailed data.

PARAMETER DESCRIPTION
loc

Location parameter

TYPE: μ

scale

Scale parameter (positive)

TYPE: σ

df

Degrees of freedom (positive, typically > 2)

TYPE: ν

For ν → ∞, approaches Normal distribution. Lower ν = heavier tails.

nll_gradient

nll_gradient(y, params)

Gradients for Student-t (simplified, using expected hessians).

fisher_information

fisher_information(params)

Fisher information for Student-t (diagonal approximation).

Tweedie

Tweedie

Tweedie(power=1.5)

Bases: Distribution

Tweedie distribution for zero-inflated positive continuous data.

Key use case: Insurance claims, revenue forecasting with zeros.

Popular in Kaggle competitions: - Porto Seguro Safe Driver Prediction - Allstate Claims Severity - Any competition with zero-inflated positive targets

The Tweedie distribution is a compound Poisson-Gamma: - ρ = 1: Poisson (count data) - 1 < ρ < 2: Compound Poisson-Gamma (zeros + positive continuous) - ρ = 2: Gamma (positive continuous)

PARAMETER DESCRIPTION
mu

Mean parameter (positive)

TYPE: μ

phi

Dispersion parameter (positive)

TYPE: φ

Why better than XGBoost? - XGBoost Tweedie only outputs point estimates - NGBoost Tweedie outputs full distribution → prediction intervals, uncertainty quantification, probabilistic forecasts

Initialize Tweedie with power parameter.

PARAMETER DESCRIPTION
power

Variance power (1 < power < 2 for compound Poisson-Gamma) 1.5 is the default used in most Kaggle competitions.

TYPE: float DEFAULT: 1.5

init_params

init_params(y)

Initialize from target values.

For Tweedie, μ = E[Y], and φ is estimated from variance.

nll_gradient

nll_gradient(y, params)

Gradients for Tweedie distribution.

Using the deviance formulation (standard in GLMs).

For Tweedie with power ρ: d(NLL)/dμ = (μ^(1-ρ) - y*μ^(-ρ)) / φ

fisher_information

fisher_information(params)

Fisher information for Tweedie.

quantile

quantile(params, q)

Approximate quantile using Normal approximation.

sample

sample(params, n_samples=1, seed=None)

Sample from Tweedie using compound Poisson-Gamma.

nll

nll(y, params)

Negative log-likelihood (deviance-based).

NegativeBinomial

NegativeBinomial

Bases: Distribution

Negative Binomial distribution for overdispersed count data.

Key use case: Sales forecasting, demand prediction, click counts.

Popular in Kaggle competitions: - Rossmann Store Sales - Bike Sharing Demand - Grupo Bimbo Inventory Demand - Any competition with count data where variance > mean

Compared to Poisson: - Poisson: Var(Y) = Mean(Y) - NegBin: Var(Y) = Mean(Y) + Mean(Y)²/r (overdispersion)

PARAMETER DESCRIPTION
mu

Mean parameter (positive)

TYPE: μ

r

Dispersion parameter (positive, smaller = more overdispersion)

Why better than XGBoost? - XGBoost can't output count distributions at all - NGBoost NegBin outputs full distribution → prediction intervals, probability of exceeding thresholds, demand planning

init_params

init_params(y)

Initialize using method of moments.

Mean = μ Var = μ + μ²/r => r = μ² / (Var - μ)

nll_gradient

nll_gradient(y, params)

Gradients for Negative Binomial.

NLL = -log Γ(y+r) + log Γ(r) + log Γ(y+1) - rlog(r/(r+μ)) - ylog(μ/(r+μ))

fisher_information

fisher_information(params)

Fisher information for Negative Binomial.

prob_exceed

prob_exceed(params, threshold)

Probability that Y > threshold.

Very useful for demand planning: "What's the probability we need more than 100 units?"

Custom Distributions

create_custom_distribution

create_custom_distribution

create_custom_distribution(
    param_names,
    link_functions,
    nll_fn,
    mean_fn=None,
    variance_fn=None,
)

Convenience function to create a custom distribution.

Example: Model y ~ Normal(A * exp(-B*x_feature), sigma)

>>> dist = create_custom_distribution(
...     param_names=['A', 'B', 'sigma'],
...     link_functions={'A': 'exp', 'B': 'softplus', 'sigma': 'exp'},
...     nll_fn=lambda y, p: 0.5*np.log(2*np.pi*p['sigma']**2) + (y-p['A'])**2/(2*p['sigma']**2),
...     mean_fn=lambda p: p['A'],
...     variance_fn=lambda p: p['sigma']**2,
... )

CustomDistribution

CustomDistribution

CustomDistribution(
    param_names,
    link_functions,
    nll_fn,
    mean_fn=None,
    variance_fn=None,
    init_fn=None,
    use_jax=True,
    eps=1e-05,
)

Bases: Distribution

User-defined distribution with automatic gradient computation.

Define any parametric distribution by specifying: 1. Parameter names and link functions 2. Negative log-likelihood function

Gradients are computed automatically via: - JAX (if available) - fastest - Numerical differentiation (fallback)

Example: Custom "ratio" distribution y ~ Normal(A*(1-B)/C, σ)

>>> def my_nll(y, params):
...     A, B, C, sigma = params['A'], params['B'], params['C'], params['sigma']
...     mu = A * (1 - B) / C
...     return 0.5 * np.log(2 * np.pi * sigma**2) + (y - mu)**2 / (2 * sigma**2)
>>> 
>>> dist = CustomDistribution(
...     param_names=['A', 'B', 'C', 'sigma'],
...     link_functions={
...         'A': 'identity',      # A ∈ (-∞, ∞)
...         'B': 'sigmoid',       # B ∈ (0, 1)
...         'C': 'softplus',      # C > 0
...         'sigma': 'exp',       # σ > 0
...     },
...     nll_fn=my_nll,
...     mean_fn=lambda params: params['A'] * (1 - params['B']) / params['C'],
... )
>>> 
>>> model = NGBoost(distribution=dist, n_trees=100)
>>> model.fit(X, y)

For Kaggle competitions with custom evaluation metrics, you can define the NLL to match the competition metric!

Initialize custom distribution.

PARAMETER DESCRIPTION
param_names

List of parameter names (e.g., ['A', 'B', 'sigma'])

TYPE: list[str]

link_functions

Dict mapping param name to link type: - 'identity': no transformation, param ∈ (-∞, ∞) - 'exp': exponential, param > 0 - 'softplus': log(1 + exp(x)), param > 0 (smoother than exp) - 'sigmoid': 1/(1+exp(-x)), param ∈ (0, 1) - 'square': x², param ≥ 0

TYPE: dict[str, str]

nll_fn

Function (y, params_dict) -> array of NLL per sample

TYPE: callable

mean_fn

Optional function (params_dict) -> mean prediction

TYPE: callable | None DEFAULT: None

variance_fn

Optional function (params_dict) -> variance

TYPE: callable | None DEFAULT: None

init_fn

Optional function (y) -> dict of initial raw param values

TYPE: callable | None DEFAULT: None

use_jax

Try to use JAX for autodiff (falls back to numerical if unavailable)

TYPE: bool DEFAULT: True

eps

Epsilon for numerical gradients

TYPE: float DEFAULT: 1e-05

nll_gradient

nll_gradient(y, params)

Compute gradients (auto-selects JAX or numerical).

fisher_information

fisher_information(params)

Approximate Fisher information (diagonal).

quantile

quantile(params, q)

Approximate quantile using Normal assumption.

sample

sample(params, n_samples=1, seed=None)

Sample using Normal approximation.

Utilities

get_distribution

get_distribution

get_distribution(name)

Get distribution by name or return instance.

PARAMETER DESCRIPTION
name

Distribution name or Distribution instance

TYPE: str | Distribution

RETURNS DESCRIPTION
Distribution

Distribution instance

Example

dist = get_distribution('normal') dist = get_distribution('gamma')

list_distributions

list_distributions

list_distributions()

List available distribution names.

Base Classes

Distribution

Distribution

Bases: ABC

Base class for probability distributions.

Subclasses must implement: - n_params: Number of distributional parameters - param_names: Names of parameters - link: Transform raw -> constrained parameter space - link_inv: Transform constrained -> raw - nll_gradient: Gradient and hessian of NLL w.r.t. raw parameters - fisher_information: Fisher information matrix (for NGBoost)

n_params abstractmethod property

n_params

Number of distributional parameters.

param_names abstractmethod property

param_names

Names of parameters, e.g., ['loc', 'scale'].

link(param_name, raw)

Apply link function: raw -> constrained parameter space.

E.g., for scale: exp(raw) to ensure positivity.

PARAMETER DESCRIPTION
param_name

Name of the parameter

TYPE: str

raw

Raw (unbounded) values

TYPE: NDArray

RETURNS DESCRIPTION
NDArray

Constrained parameter values

link_inv(param_name, param)

Inverse link: constrained -> raw (for initialization).

PARAMETER DESCRIPTION
param_name

Name of the parameter

TYPE: str

param

Constrained parameter values

TYPE: NDArray

RETURNS DESCRIPTION
NDArray

Raw (unbounded) values

nll_gradient abstractmethod

nll_gradient(y, params)

Compute gradient and hessian of NLL w.r.t. each RAW parameter.

The gradient is d(NLL)/d(raw), accounting for the link function.

PARAMETER DESCRIPTION
y

Observed target values

TYPE: NDArray

params

Dictionary of constrained parameter values

TYPE: dict[str, NDArray]

RETURNS DESCRIPTION
dict[str, GradHess]

Dictionary mapping param_name -> (gradient, hessian)

fisher_information abstractmethod

fisher_information(params)

Fisher information matrix at given parameters.

Shape: (n_samples, n_params, n_params) Used for natural gradient computation in NGBoost.

PARAMETER DESCRIPTION
params

Dictionary of constrained parameter values

TYPE: dict[str, NDArray]

RETURNS DESCRIPTION
NDArray

Fisher information matrix

natural_gradient

natural_gradient(y, params)

Compute natural gradient: F^{-1} @ ordinary_gradient.

Natural gradient accounts for the geometry of the parameter space, leading to faster convergence. This is the key insight of NGBoost.

PARAMETER DESCRIPTION
y

Observed target values

TYPE: NDArray

params

Dictionary of constrained parameter values

TYPE: dict[str, NDArray]

RETURNS DESCRIPTION
dict[str, GradHess]

Dictionary mapping param_name -> (natural_gradient, hessian)

init_params

init_params(y)

Initialize parameters from target values.

Returns raw (pre-link) initial values for each parameter.

PARAMETER DESCRIPTION
y

Target values for initialization

TYPE: NDArray

RETURNS DESCRIPTION
dict[str, float]

Dictionary mapping param_name -> initial raw value

mean abstractmethod

mean(params)

Expected value E[Y|params].

variance abstractmethod

variance(params)

Variance Var[Y|params].

quantile

quantile(params, q)

q-th quantile of the distribution.

sample

sample(params, n_samples=1, seed=None)

Sample from the distribution.

nll

nll(y, params)

Negative log-likelihood (for evaluation).

DistributionOutput

DistributionOutput dataclass

DistributionOutput(params, distribution)

Container for distribution parameter predictions.

ATTRIBUTE DESCRIPTION
params

Dictionary mapping parameter names to predicted values

TYPE: dict[str, NDArray]

distribution

The Distribution instance used

TYPE: Distribution

mean

mean()

Expected value E[Y|X].

variance

variance()

Variance Var[Y|X].

std

std()

Standard deviation.

quantile

quantile(q)

q-th quantile (0 < q < 1).

interval

interval(alpha=0.1)

(1-alpha) prediction interval.

PARAMETER DESCRIPTION
alpha

Significance level (0.1 = 90% interval)

TYPE: float DEFAULT: 0.1

RETURNS DESCRIPTION
tuple[NDArray, NDArray]

(lower, upper) bounds

sample

sample(n_samples=1, seed=None)

Draw samples from the predicted distribution.

PARAMETER DESCRIPTION
n_samples

Number of samples per observation

TYPE: int DEFAULT: 1

seed

Random seed for reproducibility

TYPE: int | None DEFAULT: None

RETURNS DESCRIPTION
samples

Shape (n_observations, n_samples)

TYPE: NDArray

nll

nll(y)

Negative log-likelihood for observed values.

PARAMETER DESCRIPTION
y

Observed values

TYPE: NDArray

RETURNS DESCRIPTION
nll

Per-sample negative log-likelihood

TYPE: NDArray