Distributions¶

Probability distributions for NaturalBoost.

Built-in Distributions¶

Normal¶

Normal ¶

Bases: Distribution

Normal (Gaussian) distribution.

PARAMETER	DESCRIPTION
`loc`	Mean, unbounded TYPE: `μ`
`scale`	Standard deviation, must be positive TYPE: `σ`

Link functions

loc: identity (unbounded) scale: exp (ensures σ > 0)

PDF: p(y) = (1/√(2πσ²)) exp(-(y-μ)²/(2σ²)) NLL: 0.5 * log(2πσ²) + (y-μ)²/(2σ²)

init_params ¶

init_params(y)

Initialize with sample mean and std.

nll_gradient ¶

nll_gradient(y, params)

Compute gradients of NLL w.r.t. raw parameters.

NLL = 0.5 * log(2πσ²) + (y - μ)² / (2σ²)

For loc (identity link): d(NLL)/dμ = -(y - μ) / σ² d²(NLL)/dμ² = 1 / σ²

For scale with exp link (σ = exp(s)): d(NLL)/ds = 1 - (y - μ)² / σ² d²(NLL)/ds² ≈ 2 (expected hessian at optimum)

fisher_information ¶

fisher_information(params)

Fisher information matrix for Normal distribution.

For Normal with exp link on scale: F = [[1/σ², 0 ], [0, 2 ]]

The off-diagonal is 0 because mean and variance are orthogonal parameters in the normal family.

quantile ¶

quantile(params, q)

Quantile of Normal distribution.

sample ¶

sample(params, n_samples=1, seed=None)

Sample from Normal distribution.

nll ¶

nll(y, params)

Negative log-likelihood.

LogNormal¶

LogNormal ¶

Bases: Distribution

Log-Normal distribution for positive continuous data.

If X ~ LogNormal(μ, σ), then log(X) ~ Normal(μ, σ).

PARAMETER	DESCRIPTION
`loc`	Mean of underlying normal TYPE: `μ`
`scale`	Std of underlying normal (must be positive) TYPE: `σ`

Link functions

loc: identity scale: exp

Mean: exp(μ + σ²/2) Variance: (exp(σ²) - 1) * exp(2μ + σ²)

init_params ¶

init_params(y)

Initialize from positive target values.

nll_gradient ¶

nll_gradient(y, params)

Gradients for LogNormal.

NLL = log(y) + 0.5*log(2πσ²) + (log(y) - μ)²/(2σ²)

Same gradients as Normal but with log(y) as target.

fisher_information ¶

fisher_information(params)

Same as Normal (parameters are for underlying normal).

Gamma¶

Gamma ¶

Bases: Distribution

Gamma distribution for positive continuous data.

Parameterization: shape (α) and rate (β) - Mean = α/β - Variance = α/β²

PARAMETER	DESCRIPTION
`concentration`	Shape parameter, must be positive TYPE: `α`
`rate`	Rate parameter, must be positive TYPE: `β`

Link functions: exp for both (ensure positivity)

Alternative: Can also be parameterized by mean and dispersion.

init_params ¶

init_params(y)

Initialize using method of moments.

mean = α/β, var = α/β² => β = mean/var, α = mean * β = mean²/var

nll_gradient ¶

nll_gradient(y, params)

Gradients for Gamma distribution.

NLL = -αlog(β) + log(Γ(α)) - (α-1)log(y) + β*y

d(NLL)/dα = -log(β) + ψ(α) - log(y) d(NLL)/dβ = -α/β + y

With exp links (α = exp(a), β = exp(b)): d(NLL)/da = α * (-log(β) + ψ(α) - log(y)) d(NLL)/db = β * (-α/β + y) = -α + β*y

fisher_information ¶

fisher_information(params)

Fisher information for Gamma (with exp links).

Poisson¶

Poisson ¶

Bases: Distribution

Poisson distribution for count data.

Single parameter: rate (λ) - Mean = λ - Variance = λ

Link function: exp (ensures λ > 0)

nll_gradient ¶

nll_gradient(y, params)

Gradients for Poisson.

NLL = λ - y*log(λ) + log(y!) d(NLL)/dλ = 1 - y/λ

With exp link (λ = exp(l)): d(NLL)/dl = λ - y d²(NLL)/dl² = λ

fisher_information ¶

fisher_information(params)

Fisher information for Poisson: F = λ.

StudentT¶

StudentT ¶

Bases: Distribution

Student-t distribution for heavy-tailed data.

PARAMETER	DESCRIPTION
`loc`	Location parameter TYPE: `μ`
`scale`	Scale parameter (positive) TYPE: `σ`
`df`	Degrees of freedom (positive, typically > 2) TYPE: `ν`

For ν → ∞, approaches Normal distribution. Lower ν = heavier tails.

Link functions

loc: identity scale: exp df: softplus (ensures > 0, typically > 2)

nll_gradient ¶

nll_gradient(y, params)

Gradients for Student-t (simplified, using expected hessians).

fisher_information ¶

fisher_information(params)

Fisher information for Student-t (diagonal approximation).

Tweedie¶

Tweedie ¶

Tweedie(power=1.5)

Bases: Distribution

Tweedie distribution for zero-inflated positive continuous data.

Key use case: Insurance claims, revenue forecasting with zeros.

Popular in Kaggle competitions: - Porto Seguro Safe Driver Prediction - Allstate Claims Severity - Any competition with zero-inflated positive targets

The Tweedie distribution is a compound Poisson-Gamma: - ρ = 1: Poisson (count data) - 1 < ρ < 2: Compound Poisson-Gamma (zeros + positive continuous) - ρ = 2: Gamma (positive continuous)

PARAMETER	DESCRIPTION
`mu`	Mean parameter (positive) TYPE: `μ`
`phi`	Dispersion parameter (positive) TYPE: `φ`

Why better than XGBoost? - XGBoost Tweedie only outputs point estimates - NGBoost Tweedie outputs full distribution → prediction intervals, uncertainty quantification, probabilistic forecasts

Link functions

mu: log (ensures μ > 0) phi: log (ensures φ > 0)

Initialize Tweedie with power parameter.

PARAMETER	DESCRIPTION
`power`	Variance power (1 < power < 2 for compound Poisson-Gamma) 1.5 is the default used in most Kaggle competitions. TYPE: `float` DEFAULT: `1.5`

init_params ¶

init_params(y)

Initialize from target values.

For Tweedie, μ = E[Y], and φ is estimated from variance.

nll_gradient ¶

nll_gradient(y, params)

Gradients for Tweedie distribution.

Using the deviance formulation (standard in GLMs).

For Tweedie with power ρ: d(NLL)/dμ = (μ^(1-ρ) - y*μ^(-ρ)) / φ

fisher_information ¶

fisher_information(params)

Fisher information for Tweedie.

quantile ¶

quantile(params, q)

Approximate quantile using Normal approximation.

sample ¶

sample(params, n_samples=1, seed=None)

Sample from Tweedie using compound Poisson-Gamma.

nll ¶

nll(y, params)

Negative log-likelihood (deviance-based).

NegativeBinomial¶

NegativeBinomial ¶

Bases: Distribution

Negative Binomial distribution for overdispersed count data.

Key use case: Sales forecasting, demand prediction, click counts.

Popular in Kaggle competitions: - Rossmann Store Sales - Bike Sharing Demand - Grupo Bimbo Inventory Demand - Any competition with count data where variance > mean

Compared to Poisson: - Poisson: Var(Y) = Mean(Y) - NegBin: Var(Y) = Mean(Y) + Mean(Y)²/r (overdispersion)

PARAMETER	DESCRIPTION
`mu`	Mean parameter (positive) TYPE: `μ`
`r`	Dispersion parameter (positive, smaller = more overdispersion)

Why better than XGBoost? - XGBoost can't output count distributions at all - NGBoost NegBin outputs full distribution → prediction intervals, probability of exceeding thresholds, demand planning

Link functions

mu: log (ensures μ > 0) r: log (ensures r > 0)

init_params ¶

init_params(y)

Initialize using method of moments.

Mean = μ Var = μ + μ²/r => r = μ² / (Var - μ)

nll_gradient ¶

nll_gradient(y, params)

Gradients for Negative Binomial.

NLL = -log Γ(y+r) + log Γ(r) + log Γ(y+1) - rlog(r/(r+μ)) - ylog(μ/(r+μ))

fisher_information ¶

fisher_information(params)

Fisher information for Negative Binomial.

prob_exceed ¶

prob_exceed(params, threshold)

Probability that Y > threshold.

Very useful for demand planning: "What's the probability we need more than 100 units?"

Custom Distributions¶

create_custom_distribution¶

create_custom_distribution ¶

create_custom_distribution(
    param_names,
    link_functions,
    nll_fn,
    mean_fn=None,
    variance_fn=None,
)

Convenience function to create a custom distribution.

Example: Model y ~ Normal(A * exp(-B*x_feature), sigma)

>>> dist = create_custom_distribution(
...     param_names=['A', 'B', 'sigma'],
...     link_functions={'A': 'exp', 'B': 'softplus', 'sigma': 'exp'},
...     nll_fn=lambda y, p: 0.5*np.log(2*np.pi*p['sigma']**2) + (y-p['A'])**2/(2*p['sigma']**2),
...     mean_fn=lambda p: p['A'],
...     variance_fn=lambda p: p['sigma']**2,
... )

CustomDistribution¶

CustomDistribution ¶

CustomDistribution(
    param_names,
    link_functions,
    nll_fn,
    mean_fn=None,
    variance_fn=None,
    init_fn=None,
    use_jax=True,
    eps=1e-05,
)

Bases: Distribution

User-defined distribution with automatic gradient computation.

Define any parametric distribution by specifying: 1. Parameter names and link functions 2. Negative log-likelihood function

Gradients are computed automatically via: - JAX (if available) - fastest - Numerical differentiation (fallback)

Example: Custom "ratio" distribution y ~ Normal(A*(1-B)/C, σ)

>>> def my_nll(y, params):
...     A, B, C, sigma = params['A'], params['B'], params['C'], params['sigma']
...     mu = A * (1 - B) / C
...     return 0.5 * np.log(2 * np.pi * sigma**2) + (y - mu)**2 / (2 * sigma**2)
>>> 
>>> dist = CustomDistribution(
...     param_names=['A', 'B', 'C', 'sigma'],
...     link_functions={
...         'A': 'identity',      # A ∈ (-∞, ∞)
...         'B': 'sigmoid',       # B ∈ (0, 1)
...         'C': 'softplus',      # C > 0
...         'sigma': 'exp',       # σ > 0
...     },
...     nll_fn=my_nll,
...     mean_fn=lambda params: params['A'] * (1 - params['B']) / params['C'],
... )
>>> 
>>> model = NGBoost(distribution=dist, n_trees=100)
>>> model.fit(X, y)

For Kaggle competitions with custom evaluation metrics, you can define the NLL to match the competition metric!

Initialize custom distribution.

PARAMETER	DESCRIPTION
`param_names`	List of parameter names (e.g., ['A', 'B', 'sigma']) TYPE: `list[str]`
`link_functions`	Dict mapping param name to link type: - 'identity': no transformation, param ∈ (-∞, ∞) - 'exp': exponential, param > 0 - 'softplus': log(1 + exp(x)), param > 0 (smoother than exp) - 'sigmoid': 1/(1+exp(-x)), param ∈ (0, 1) - 'square': x², param ≥ 0 TYPE: `dict[str, str]`
`nll_fn`	Function (y, params_dict) -> array of NLL per sample TYPE: `callable`
`mean_fn`	Optional function (params_dict) -> mean prediction TYPE: `callable \| None` DEFAULT: `None`
`variance_fn`	Optional function (params_dict) -> variance TYPE: `callable \| None` DEFAULT: `None`
`init_fn`	Optional function (y) -> dict of initial raw param values TYPE: `callable \| None` DEFAULT: `None`
`use_jax`	Try to use JAX for autodiff (falls back to numerical if unavailable) TYPE: `bool` DEFAULT: `True`
`eps`	Epsilon for numerical gradients TYPE: `float` DEFAULT: `1e-05`

nll_gradient ¶

nll_gradient(y, params)

Compute gradients (auto-selects JAX or numerical).

fisher_information ¶

fisher_information(params)

Approximate Fisher information (diagonal).

quantile ¶

quantile(params, q)

Approximate quantile using Normal assumption.

sample ¶

sample(params, n_samples=1, seed=None)

Sample using Normal approximation.

Utilities¶

get_distribution¶

get_distribution ¶

get_distribution(name)

Get distribution by name or return instance.

PARAMETER	DESCRIPTION
`name`	Distribution name or Distribution instance TYPE: `str \| Distribution`

RETURNS	DESCRIPTION
`Distribution`	Distribution instance

Example

dist = get_distribution('normal') dist = get_distribution('gamma')

list_distributions¶

list_distributions ¶

list_distributions()

List available distribution names.

Base Classes¶

Distribution¶

Distribution ¶

Bases: ABC

Base class for probability distributions.

Subclasses must implement: - n_params: Number of distributional parameters - param_names: Names of parameters - link: Transform raw -> constrained parameter space - link_inv: Transform constrained -> raw - nll_gradient: Gradient and hessian of NLL w.r.t. raw parameters - fisher_information: Fisher information matrix (for NGBoost)

n_params `abstractmethod` `property` ¶

n_params

Number of distributional parameters.

param_names `abstractmethod` `property` ¶

param_names

Names of parameters, e.g., ['loc', 'scale'].

link `abstractmethod` ¶

link(param_name, raw)

Apply link function: raw -> constrained parameter space.

E.g., for scale: exp(raw) to ensure positivity.

PARAMETER	DESCRIPTION
`param_name`	Name of the parameter TYPE: `str`
`raw`	Raw (unbounded) values TYPE: `NDArray`

RETURNS	DESCRIPTION
`NDArray`	Constrained parameter values

link_inv `abstractmethod` ¶

link_inv(param_name, param)

Inverse link: constrained -> raw (for initialization).

PARAMETER	DESCRIPTION
`param_name`	Name of the parameter TYPE: `str`
`param`	Constrained parameter values TYPE: `NDArray`

RETURNS	DESCRIPTION
`NDArray`	Raw (unbounded) values

nll_gradient `abstractmethod` ¶

nll_gradient(y, params)

Compute gradient and hessian of NLL w.r.t. each RAW parameter.

The gradient is d(NLL)/d(raw), accounting for the link function.

PARAMETER	DESCRIPTION
`y`	Observed target values TYPE: `NDArray`
`params`	Dictionary of constrained parameter values TYPE: `dict[str, NDArray]`

RETURNS	DESCRIPTION
`dict[str, GradHess]`	Dictionary mapping param_name -> (gradient, hessian)

fisher_information `abstractmethod` ¶

fisher_information(params)

Fisher information matrix at given parameters.

Shape: (n_samples, n_params, n_params) Used for natural gradient computation in NGBoost.

PARAMETER	DESCRIPTION
`params`	Dictionary of constrained parameter values TYPE: `dict[str, NDArray]`

RETURNS	DESCRIPTION
`NDArray`	Fisher information matrix

natural_gradient ¶

natural_gradient(y, params)

Compute natural gradient: F^{-1} @ ordinary_gradient.

Natural gradient accounts for the geometry of the parameter space, leading to faster convergence. This is the key insight of NGBoost.

PARAMETER	DESCRIPTION
`y`	Observed target values TYPE: `NDArray`
`params`	Dictionary of constrained parameter values TYPE: `dict[str, NDArray]`

RETURNS	DESCRIPTION
`dict[str, GradHess]`	Dictionary mapping param_name -> (natural_gradient, hessian)

init_params ¶

init_params(y)

Initialize parameters from target values.

Returns raw (pre-link) initial values for each parameter.

PARAMETER	DESCRIPTION
`y`	Target values for initialization TYPE: `NDArray`

RETURNS	DESCRIPTION
`dict[str, float]`	Dictionary mapping param_name -> initial raw value

mean `abstractmethod` ¶

mean(params)

Expected value E[Y|params].

variance `abstractmethod` ¶

variance(params)

Variance Var[Y|params].

quantile ¶

quantile(params, q)

q-th quantile of the distribution.

sample ¶

sample(params, n_samples=1, seed=None)

Sample from the distribution.

nll ¶

nll(y, params)

Negative log-likelihood (for evaluation).

DistributionOutput¶

DistributionOutput `dataclass` ¶

DistributionOutput(params, distribution)

Container for distribution parameter predictions.

ATTRIBUTE	DESCRIPTION
`params`	Dictionary mapping parameter names to predicted values TYPE: `dict[str, NDArray]`
`distribution`	The Distribution instance used TYPE: `Distribution`

mean ¶

mean()

Expected value E[Y|X].

variance ¶

variance()

Variance Var[Y|X].

std ¶

std()

Standard deviation.

quantile ¶

quantile(q)

q-th quantile (0 < q < 1).

interval ¶

interval(alpha=0.1)

(1-alpha) prediction interval.

PARAMETER	DESCRIPTION
`alpha`	Significance level (0.1 = 90% interval) TYPE: `float` DEFAULT: `0.1`

RETURNS	DESCRIPTION
`tuple[NDArray, NDArray]`	(lower, upper) bounds

sample ¶

sample(n_samples=1, seed=None)

Draw samples from the predicted distribution.

PARAMETER	DESCRIPTION
`n_samples`	Number of samples per observation TYPE: `int` DEFAULT: `1`
`seed`	Random seed for reproducibility TYPE: `int \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`samples`	Shape (n_observations, n_samples) TYPE: `NDArray`

nll ¶

nll(y)

Negative log-likelihood for observed values.

PARAMETER	DESCRIPTION
`y`	Observed values TYPE: `NDArray`

RETURNS	DESCRIPTION
`nll`	Per-sample negative log-likelihood TYPE: `NDArray`

Distributions¶

Built-in Distributions¶

Normal¶

Normal ¶

init_params ¶

nll_gradient ¶

fisher_information ¶

quantile ¶

sample ¶

nll ¶

LogNormal¶

LogNormal ¶

init_params ¶

nll_gradient ¶

fisher_information ¶

Gamma¶

Gamma ¶

init_params ¶

nll_gradient ¶

fisher_information ¶

Poisson¶

Poisson ¶

nll_gradient ¶

fisher_information ¶

StudentT¶

StudentT ¶

nll_gradient ¶

fisher_information ¶

Tweedie¶

Tweedie ¶

init_params ¶

nll_gradient ¶

fisher_information ¶

quantile ¶

sample ¶

nll ¶

NegativeBinomial¶

NegativeBinomial ¶

init_params ¶

nll_gradient ¶

fisher_information ¶

prob_exceed ¶

Custom Distributions¶

create_custom_distribution¶

create_custom_distribution ¶

CustomDistribution¶

CustomDistribution ¶

nll_gradient ¶

fisher_information ¶

quantile ¶

sample ¶

Utilities¶

get_distribution¶

get_distribution ¶

list_distributions¶

list_distributions ¶

Base Classes¶

Distribution¶

Distribution ¶

n_params abstractmethod property ¶

param_names abstractmethod property ¶

link abstractmethod ¶

link_inv abstractmethod ¶

nll_gradient abstractmethod ¶

fisher_information abstractmethod ¶

natural_gradient ¶

init_params ¶

mean abstractmethod ¶

variance abstractmethod ¶

quantile ¶

sample ¶

nll ¶

DistributionOutput¶

DistributionOutput dataclass ¶

mean ¶

variance ¶

std ¶

quantile ¶

interval ¶

sample ¶

n_params `abstractmethod` `property` ¶

param_names `abstractmethod` `property` ¶

link `abstractmethod` ¶

link_inv `abstractmethod` ¶

nll_gradient `abstractmethod` ¶

fisher_information `abstractmethod` ¶

mean `abstractmethod` ¶

variance `abstractmethod` ¶

DistributionOutput `dataclass` ¶