Migrating from XGBoost to OpenBoost¶
This guide helps you transition from XGBoost to OpenBoost with minimal changes.
Parameter Mapping¶
XGBoost → OpenBoost¶
| XGBoost Parameter | OpenBoost Parameter | Notes |
|---|---|---|
n_estimators |
n_trees / n_estimators |
Same meaning |
max_depth |
max_depth |
Same meaning |
learning_rate / eta |
learning_rate |
Same meaning |
min_child_weight |
min_child_weight |
Same meaning |
reg_lambda / lambda |
reg_lambda |
L2 regularization |
reg_alpha / alpha |
reg_alpha |
L1 regularization |
subsample |
subsample |
Row sampling |
colsample_bytree |
colsample_bytree |
Column sampling |
gamma / min_split_loss |
gamma |
Min gain to split |
objective |
loss |
See loss mapping below |
Loss Function Mapping¶
| XGBoost Objective | OpenBoost Loss |
|---|---|
reg:squarederror |
'mse' |
reg:absoluteerror |
'mae' |
reg:pseudohubererror |
'huber' |
binary:logistic |
'logloss' |
multi:softmax |
Use MultiClassGradientBoosting |
multi:softprob |
Use MultiClassGradientBoosting |
count:poisson |
'poisson' |
reg:gamma |
'gamma' |
reg:tweedie |
'tweedie' |
Code Examples¶
Basic Regression¶
# XGBoost
import xgboost as xgb
model = xgb.XGBRegressor(
n_estimators=100,
max_depth=6,
learning_rate=0.1,
reg_lambda=1.0,
)
model.fit(X_train, y_train)
pred = model.predict(X_test)
# OpenBoost equivalent
import openboost as ob
model = ob.GradientBoosting(
n_trees=100,
max_depth=6,
learning_rate=0.1,
reg_lambda=1.0,
loss='mse',
)
model.fit(X_train, y_train)
pred = model.predict(X_test)
Binary Classification¶
# XGBoost
model = xgb.XGBClassifier(
n_estimators=100,
max_depth=6,
objective='binary:logistic',
)
model.fit(X_train, y_train)
pred_proba = model.predict_proba(X_test)[:, 1]
# OpenBoost equivalent
model = ob.GradientBoosting(
n_trees=100,
max_depth=6,
loss='logloss',
)
model.fit(X_train, y_train)
logits = model.predict(X_test)
pred_proba = 1 / (1 + np.exp(-logits)) # Sigmoid
# Or use sklearn wrapper
from openboost import OpenBoostClassifier
model = OpenBoostClassifier(n_estimators=100, max_depth=6)
model.fit(X_train, y_train)
pred_proba = model.predict_proba(X_test)[:, 1]
Multi-Class Classification¶
# XGBoost
model = xgb.XGBClassifier(
n_estimators=100,
max_depth=6,
objective='multi:softprob',
num_class=5,
)
model.fit(X_train, y_train)
pred_proba = model.predict_proba(X_test)
pred = model.predict(X_test)
# OpenBoost equivalent
model = ob.MultiClassGradientBoosting(
n_classes=5,
n_trees=100,
max_depth=6,
)
model.fit(X_train, y_train)
pred_proba = model.predict_proba(X_test)
pred = model.predict(X_test)
# Or use sklearn wrapper
from openboost import OpenBoostClassifier
model = OpenBoostClassifier(n_estimators=100, max_depth=6)
model.fit(X_train, y_train) # Auto-detects multi-class
sklearn-Compatible API¶
OpenBoost provides drop-in replacements for XGBoost's sklearn API:
# XGBoost sklearn
from xgboost import XGBRegressor, XGBClassifier
# OpenBoost sklearn (same interface!)
from openboost import OpenBoostRegressor, OpenBoostClassifier
# Works with sklearn pipelines
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
pipeline = Pipeline([
('scaler', StandardScaler()),
('model', OpenBoostRegressor(n_estimators=100)),
])
# Works with cross-validation
from sklearn.model_selection import cross_val_score
scores = cross_val_score(OpenBoostRegressor(), X, y, cv=5)
# Works with grid search
from sklearn.model_selection import GridSearchCV
grid = GridSearchCV(
OpenBoostRegressor(),
{'n_estimators': [50, 100], 'max_depth': [4, 6]},
cv=3,
)
grid.fit(X, y)
Early Stopping¶
# XGBoost
model = xgb.XGBRegressor(
n_estimators=1000,
early_stopping_rounds=10,
)
model.fit(
X_train, y_train,
eval_set=[(X_val, y_val)],
verbose=False,
)
# OpenBoost equivalent
from openboost import EarlyStopping, Logger
model = ob.GradientBoosting(n_trees=1000)
model.fit(
X_train, y_train,
callbacks=[
EarlyStopping(patience=10),
Logger(every=10),
],
eval_set=[(X_val, y_val)],
)
Feature Importance¶
# XGBoost
model.fit(X_train, y_train)
importance = model.feature_importances_
# OpenBoost
model.fit(X_train, y_train)
importance = ob.compute_feature_importances(model.trees_)
# Or with sklearn wrapper
from openboost import OpenBoostRegressor
model = OpenBoostRegressor()
model.fit(X_train, y_train)
importance = model.feature_importances_ # Same as XGBoost!
Saving and Loading¶
# XGBoost
model.save_model('model.json')
loaded = xgb.XGBRegressor()
loaded.load_model('model.json')
# OpenBoost
model.save('model.joblib')
loaded = ob.GradientBoosting.load('model.joblib')
# Or with joblib directly (same as XGBoost pickle)
import joblib
joblib.dump(model, 'model.joblib')
loaded = joblib.load('model.joblib')
Feature Comparison¶
| Feature | XGBoost | OpenBoost |
|---|---|---|
| GPU Support | ✅ | ✅ |
| Custom Loss | ⚠️ (requires Python wrapper) | ✅ (native Python) |
| Uncertainty | ❌ | ✅ (NaturalBoost) |
| Interpretable GAM | ❌ | ✅ (OpenBoostGAM) |
| Linear Leaves | ❌ | ✅ (LinearLeafGBDT) |
| DART | ✅ | ✅ |
| Growth Strategies | Level-wise | Level-wise, Leaf-wise, Symmetric |
| GOSS Sampling | ❌ (use LightGBM) | ✅ |
| Pure Python | ❌ (C++) | ✅ |
What OpenBoost Does Better¶
1. Uncertainty Quantification¶
# XGBoost: Just point predictions
pred = xgb_model.predict(X_test) # Single number
# OpenBoost: Full distributions
model = ob.NaturalBoostNormal(n_trees=100)
model.fit(X_train, y_train)
mean = model.predict(X_test)
lower, upper = model.predict_interval(X_test) # 90% interval
samples = model.sample(X_test, n_samples=1000) # Monte Carlo
2. Custom Loss Functions¶
# XGBoost: Requires Python callback wrapper, tricky to get right
# OpenBoost: Native Python, just return (grad, hess)
def my_loss(pred, y):
grad = pred - y
hess = np.ones_like(pred)
return grad.astype(np.float32), hess.astype(np.float32)
model = ob.GradientBoosting(loss=my_loss)
3. Interpretable Models¶
# XGBoost: SHAP values (post-hoc, expensive)
# OpenBoost: Inherently interpretable GAM
gam = ob.OpenBoostGAM(n_rounds=500)
gam.fit(X_train, y_train)
gam.plot_shape_function(0, feature_name="age")
4. Code Readability¶
What XGBoost Does Better¶
1. Raw Speed on CPU¶
XGBoost is highly optimized C++ and will be faster on CPU for very large datasets.
2. Distributed Training (Spark, Dask)¶
XGBoost has mature distributed training support.
3. Community and Ecosystem¶
XGBoost has more examples, tutorials, and community support.
Migration Checklist¶
- [ ] Replace
xgb.XGBRegressorwithob.GradientBoostingorOpenBoostRegressor - [ ] Replace
xgb.XGBClassifierwithob.GradientBoosting(loss='logloss')orOpenBoostClassifier - [ ] Replace
n_estimatorswithn_trees(or use sklearn wrapper) - [ ] Replace
objectivewithloss - [ ] Update early stopping syntax
- [ ] Update feature importance code
- [ ] Update save/load code
Gradual Migration¶
You can use both libraries during migration:
import xgboost as xgb
import openboost as ob
# Keep XGBoost for existing models
xgb_model = xgb.XGBRegressor()
xgb_model.fit(X_train, y_train)
# Try OpenBoost for new features
ob_model = ob.NaturalBoostNormal() # Uncertainty!
ob_model.fit(X_train, y_train)
# Compare predictions
xgb_pred = xgb_model.predict(X_test)
ob_pred = ob_model.predict(X_test)
print(f"Correlation: {np.corrcoef(xgb_pred, ob_pred)[0,1]:.4f}")
Getting Help¶
- Quickstart Guide - Get started with OpenBoost
- Uncertainty Tutorial - Learn NaturalBoost
- Custom Loss Tutorial - Define your own objectives