sklearn Integration¶
OpenBoost provides sklearn-compatible wrappers for seamless integration with scikit-learn pipelines.
Available Wrappers¶
| Wrapper | Base Model | Use Case |
|---|---|---|
OpenBoostRegressor |
GradientBoosting | Regression |
OpenBoostClassifier |
GradientBoosting | Classification |
OpenBoostDistributionalRegressor |
NaturalBoost | Probabilistic regression |
OpenBoostLinearLeafRegressor |
LinearLeafGBDT | Linear leaf regression |
Basic Usage¶
from openboost import OpenBoostRegressor, OpenBoostClassifier
# Regressor
reg = OpenBoostRegressor(n_estimators=100, max_depth=6)
reg.fit(X_train, y_train)
print(f"R² Score: {reg.score(X_test, y_test):.4f}")
# Classifier
clf = OpenBoostClassifier(n_estimators=100, max_depth=6)
clf.fit(X_train, y_train)
print(f"Accuracy: {clf.score(X_test, y_test):.4f}")
Cross-Validation¶
from sklearn.model_selection import cross_val_score
reg = OpenBoostRegressor(n_estimators=100)
scores = cross_val_score(reg, X, y, cv=5)
print(f"CV Score: {scores.mean():.4f} ± {scores.std():.4f}")
Grid Search¶
from sklearn.model_selection import GridSearchCV
from openboost import OpenBoostRegressor, get_param_grid
# Get suggested parameter grid
param_grid = get_param_grid('regression')
# Or define your own
param_grid = {
'n_estimators': [100, 300, 500],
'max_depth': [4, 6, 8],
'learning_rate': [0.01, 0.05, 0.1],
}
search = GridSearchCV(OpenBoostRegressor(), param_grid, cv=5)
search.fit(X, y)
print(f"Best params: {search.best_params_}")
Pipeline¶
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from openboost import OpenBoostRegressor
pipeline = Pipeline([
('scaler', StandardScaler()),
('model', OpenBoostRegressor(n_estimators=100)),
])
pipeline.fit(X_train, y_train)
predictions = pipeline.predict(X_test)
Parameter Mapping¶
sklearn wrapper parameters map to OpenBoost parameters:
| sklearn Parameter | OpenBoost Parameter |
|---|---|
n_estimators |
n_trees |
max_depth |
max_depth |
learning_rate |
learning_rate |
min_samples_leaf |
min_child_weight |
subsample |
subsample |
Out-of-Fold Predictions¶
from openboost import cross_val_predict, cross_val_predict_proba
# Regression
oof_pred = cross_val_predict(model, X, y, cv=5)
# Classification
oof_proba = cross_val_predict_proba(classifier, X, y, cv=5)