Gaussian Process Models

JaxBo provides several Gaussian Process implementations optimized for different scenarios.

Base GP Class

class BOBE.gp.GP(train_x, train_y, noise=1e-08, kernel='rbf', optimizer='scipy', optimizer_options={}, kernel_variance_bounds=[0.0001, 100000000.0], lengthscale_bounds=[0.01, 5], lengthscales=None, kernel_variance=None, kernel_variance_prior=None, lengthscale_prior=None, tausq=None, tausq_bounds=[0.0001, 10000.0], param_names=None)[source]

Bases: object

__init__(train_x, train_y, noise=1e-08, kernel='rbf', optimizer='scipy', optimizer_options={}, kernel_variance_bounds=[0.0001, 100000000.0], lengthscale_bounds=[0.01, 5], lengthscales=None, kernel_variance=None, kernel_variance_prior=None, lengthscale_prior=None, tausq=None, tausq_bounds=[0.0001, 10000.0], param_names=None)[source]

Initialize the Gaussian Process model.

Parameters:
  • train_x (jnp.ndarray) – Training inputs, shape (N, D).

  • train_y (jnp.ndarray) – Objective function values at training points, shape (N, 1).

  • noise (float, optional) – Noise parameter added to the diagonal of the kernel. Default is 1e-8.

  • kernel (str, optional) – Kernel to use, either “rbf” or “matern”. Default is “rbf”.

  • optimizer (str, optional) – Optimizer to use for hyperparameter tuning. Default is “scipy”.

  • optimizer_options (dict, optional) – Keyword arguments for the optimizer. Default is {}.

  • kernel_variance_bounds (list, optional) – Bounds for the kernel variance. Default is [1e-4, 1e8].

  • lengthscale_bounds (list, optional) – Bounds for the lengthscales. Default is [0.01, 10].

  • lengthscales (jnp.ndarray, optional) – Initial lengthscale values. If None, defaults to ones. Default is None.

  • kernel_variance (float, optional) – Initial kernel variance. If None, defaults to 1.0. Default is None.

  • kernel_variance_prior (dict or str, optional) – Specification for the kernel variance prior. If None, defaults to {‘name’: ‘LogNormal’, ‘loc’: 0.0, ‘scale’: 1.0}. If ‘fixed’, the kernel variance will be fixed to the initial value and not optimized. Defaults to None.

  • lengthscale_prior (str or dict, optional) – Specification for the lengthscale prior. If ‘DSLP’ or None, uses the DSLP prior. If ‘SAAS’, uses the SAAS prior with tausq parameter. Otherwise, uses the provided distribution spec. Defaults to None.

  • tausq (float, optional) – Initial tausq parameter for SAAS prior. Only used when lengthscale_prior=’SAAS’. If None, defaults to 1.0. Defaults to None.

  • tausq_bounds (list, optional) – Bounds for the tausq parameter (in log space). Only used when lengthscale_prior=’SAAS’. Defaults to [-4, 4].

neg_mll(log_params)[source]

Computes the negative log marginal likelihood for the GP with given hyperparameters.

fit(x0=None, maxiter=500)[source]

Performs a serial fit for a given batch of starting points (x0). This method is called by each MPI process on its assigned chunk.

Parameters:
  • x0 (ndarray) – Array of shape (n_restarts_chunk, n_params) containing starting points for optimization (in log space).

  • maxiter (int) – Maximum number of iterations for the optimizer. Defaults to 500.

Returns:

result – Dictionary containing the best ‘mll’ and corresponding ‘params’ (log space) found.

Return type:

dict

update_hyperparams(hyperparams)[source]

Update the GP hyperparameters and recompute the Cholesky and alphas.

predict_mean_single(x)[source]

Single point prediction of mean

predict_var_single(x)[source]
predict_mean_batched(x)[source]
predict_var_batched(x)[source]
predict_single(x)[source]

Predicts the mean and variance of the GP at x but does not unstandardize it. To use with EI and the like.

predict_batched(x)[source]
update(new_x, new_y)[source]

Updates the GP with new training points and refits the GP if refit is True.

Parameters:
  • refit (bool) – Whether to refit the GP hyperparameters. Default is True.

  • maxiter (int) – The maximum number of iterations for the optax optimizer. Default is 200.

  • n_restarts (int) – The number of restarts for the optax optimizer. Default is 4.

recompute_cholesky()[source]

Recomputes the Cholesky decomposition and alphas. Useful if hyperparameters are changed manually.

fantasy_var(new_x, mc_points, k_train_mc)[source]

Computes the variance of the GP at the mc_points assuming a single point new_x is added to the training set

get_random_point(rng=None, nstd=None)[source]

Returns a random point in the unit cube.

state_dict()[source]

Returns a dictionary containing the complete state of the GP. This can be used for saving, loading, or copying the GP.

Returns:

state – Dictionary containing all necessary information to reconstruct the GP

Return type:

dict

classmethod from_state_dict(state)[source]

Creates a GP instance from a state dictionary.

Parameters:

state (dict) – State dictionary returned by state_dict()

Returns:

gp – The reconstructed GP object

Return type:

GP

classmethod load(filename, **kwargs)[source]

Loads a GP from a file

Parameters:
  • filename (str) – The name of the file to load the GP from (with or without .npz extension)

  • **kwargs – Additional keyword arguments to pass to the GP constructor

Returns:

gp – The loaded GP object

Return type:

GP

save(filename='gp')[source]

Save the GP state to a file using state_dict.

Parameters:

filename (str) – The filename to save to (with or without .npz extension). Default is ‘gp’.

copy()[source]

Creates a deep copy of the GP using state_dict.

Returns:

gp_copy – A deep copy of the current GP

Return type:

GP

property npoints
get_hyperparams()[source]
hyperparams_dict()[source]

Gaussian Process with Classifier

For handling constraints and invalid regions.

class BOBE.clf_gp.GPwithClassifier(train_x=None, train_y=None, clf_type='svm', clf_settings={}, clf_use_size=10, clf_update_step=1, probability_threshold=0.5, minus_inf=-100000.0, clf_threshold=250.0, gp_threshold=500.0, noise=1e-08, kernel='rbf', optimizer='scipy', optimizer_options={}, kernel_variance_bounds=[0.0001, 100000000.0], lengthscale_bounds=[0.01, 5.0], tausq=None, tausq_bounds=[0.0001, 10000.0], kernel_variance_prior=None, lengthscale_prior=None, lengthscales=None, kernel_variance=1.0, param_names=None, train_clf_on_init=True)[source]

Bases: GP

__init__(train_x=None, train_y=None, clf_type='svm', clf_settings={}, clf_use_size=10, clf_update_step=1, probability_threshold=0.5, minus_inf=-100000.0, clf_threshold=250.0, gp_threshold=500.0, noise=1e-08, kernel='rbf', optimizer='scipy', optimizer_options={}, kernel_variance_bounds=[0.0001, 100000000.0], lengthscale_bounds=[0.01, 5.0], tausq=None, tausq_bounds=[0.0001, 10000.0], kernel_variance_prior=None, lengthscale_prior=None, lengthscales=None, kernel_variance=1.0, param_names=None, train_clf_on_init=True)[source]

Generic Classifier-GP class combining a GP with a classifier. The GP is trained on the data points that are within the GP threshold of the maximum value of the GP.

Parameters:
  • train_x (array-like, shape (n_samples, n_dim)) – Initial training points.

  • train_y (array-like, shape (n_samples,)) – Initial training values.

  • clf_type (str, optional) – Type of classifier (‘svm’, ‘nn’, ‘ellipsoid’, etc.). Default is ‘svm’.

  • clf_params (dict, optional) – Parameters specific to the chosen classifier. Default is None.

  • clf_use_size (int, optional) – Minimum number of points to start using the classifier. Default is 300.

  • clf_update_step (int, optional) – Update classifier every clf_update_step points after clf_use_size is reached. Default is 5.

  • probability_threshold (float, optional) – Threshold for classifier probability/score to consider a point feasible (important for nn, ellipsoid). Default is 0.5.

  • minus_inf (float, optional) – Value used for infeasible predictions. Default is -1e5.

  • clf_threshold (float, optional) – Threshold for initial classifier training labels (if used). If None, gp_threshold might be used or a default calculated.

  • gp_threshold (float, optional) – Threshold for adding points to the GP training set. Default is 5000.

  • noise – GP parameters (see DSLP_GP/SAAS_GP). Note: bounds are now in actual space, not log10.

  • kernel – GP parameters (see DSLP_GP/SAAS_GP). Note: bounds are now in actual space, not log10.

  • optimizer – GP parameters (see DSLP_GP/SAAS_GP). Note: bounds are now in actual space, not log10.

  • kernel_variance_bounds – GP parameters (see DSLP_GP/SAAS_GP). Note: bounds are now in actual space, not log10.

  • lengthscale_bounds – GP parameters (see DSLP_GP/SAAS_GP). Note: bounds are now in actual space, not log10.

  • lengthscale_priors – GP parameters (see DSLP_GP/SAAS_GP). Note: bounds are now in actual space, not log10.

  • lengthscales – GP parameters (see DSLP_GP/SAAS_GP). Note: bounds are now in actual space, not log10.

  • kernel_variance – GP parameters (see DSLP_GP/SAAS_GP). Note: bounds are now in actual space, not log10.

train_classifier()[source]

Public method to train/retrain the classifier.

predict_mean_single(x)[source]

Single point prediction of mean

predict_var_single(x)[source]
predict_mean_batched(x)[source]
predict_var_batched(x)[source]
predict_single(x)[source]

Predicts the mean and variance of the GP at x but does not unstandardize it. To use with EI and the like.

fantasy_var(new_x, mc_points, k_train_mc)[source]

Computes the fantasy variance, see gp.py for more details. Classifier logic could potentially be added here if needed.

update(new_x, new_y)[source]

Updates the classifier and GP training sets. Retrains classifier/GP based on thresholds and steps.

kernel(x1, x2, lengthscales, kernel_variance, noise, include_noise=True)[source]

Returns the kernel function used by the GP.

get_random_point(rng=None, nstd=None)[source]

Returns a random point in the unit cube.

state_dict()[source]

Returns a dictionary containing the complete state of the GPwithClassifier. This can be used for saving, loading, or copying the GPwithClassifier.

Returns:

state – Dictionary containing all necessary information to reconstruct the GPwithClassifier

Return type:

dict

classmethod from_state_dict(state)[source]

Creates a GPwithClassifier instance from a state dictionary.

Parameters:

state (dict) – State dictionary returned by state_dict()

Returns:

gp_clf – The reconstructed GPwithClassifier object

Return type:

GPwithClassifier

save(filename='gp')[source]

Save the GPwithClassifier state to a file using state_dict.

Parameters:

filename (str) – The filename to save to (with or without .npz extension). Default is ‘gp’.

classmethod load(filename, **kwargs)[source]

Loads a GPwithClassifier from a file

Parameters:
  • filename (str) – The name of the file to load the GPwithClassifier from (with or without .npz extension)

  • **kwargs – Additional keyword arguments to pass to the GPwithClassifier constructor

Returns:

gp_clf – The loaded GPwithClassifier object

Return type:

GPwithClassifier

copy()[source]

Creates a deep copy of the GPwithClassifier using state_dict.

Returns:

gp_clf_copy – A deep copy of the current GPwithClassifier

Return type:

GPwithClassifier

property clf_data_size

Size of the classifier’s training inputs.

property npoints

Kernel Functions

JaxBo uses object-oriented kernel implementations for GP covariance computation.

class BOBE.kernels.Kernel(lengthscales, kernel_variance, noise=1e-08)[source]

Bases: ABC

Abstract base class for all kernels in BOBE.

lengthscales

Lengthscale parameters for each dimension, shape (D,)

Type:

jnp.ndarray

kernel_variance

Overall variance/amplitude of the kernel

Type:

float

noise

Observation noise level

Type:

float

__init__(lengthscales, kernel_variance, noise=1e-08)[source]

Initialize kernel with hyperparameters.

Parameters:
  • lengthscales (jnp.ndarray) – Lengthscale for each input dimension

  • kernel_variance (float) – Kernel variance/amplitude parameter

  • noise (float, optional) – Noise level added to diagonal. Default is 1e-8.

sq_dist(xa, xb)[source]

Compute squared Euclidean distance between two sets of points.

This utility method is used by many kernel implementations.

Parameters:
  • xa (jnp.ndarray) – First set of points, shape (n1, D)

  • xb (jnp.ndarray) – Second set of points, shape (n2, D)

Returns:

sq_dist – Squared distances, shape (n1, n2)

Return type:

jnp.ndarray

abstractmethod covariance(xa, xb, include_noise=True)[source]

Compute covariance matrix between two sets of points.

Parameters:
  • xa (jnp.ndarray) – First set of points, shape (n1, D)

  • xb (jnp.ndarray) – Second set of points, shape (n2, D)

  • include_noise (bool, optional) – Whether to add noise to diagonal (only when xa is xb). Default is True.

Returns:

K – Covariance matrix of shape (n1, n2)

Return type:

jnp.ndarray

diagonal(x, include_noise=True)[source]

Compute only the diagonal of the kernel matrix K(x,x).

For stationary kernels, the diagonal is constant: kernel_variance (+ noise). Override this method if your kernel has a non-constant diagonal.

Parameters:
  • x (jnp.ndarray) – Points at which to compute diagonal, shape (n, D)

  • include_noise (bool, optional) – Whether to include noise in diagonal. Default is True.

Returns:

diag – Diagonal values, shape (n,)

Return type:

jnp.ndarray

update_hyperparams(lengthscales=None, kernel_variance=None, noise=None)[source]

Update kernel hyperparameters.

Parameters:
  • lengthscales (jnp.ndarray, optional) – New lengthscale values

  • kernel_variance (float, optional) – New kernel variance

  • noise (float, optional) – New noise level

__call__(xa, xb, include_noise=True)[source]

Convenience method - same as covariance()

class BOBE.kernels.RBFKernel(lengthscales, kernel_variance, noise=1e-08)[source]

Bases: Kernel

Radial Basis Function (RBF) / Squared Exponential kernel.

k(x, x’) = σ² * exp(-0.5 * ||x - x’||²/ℓ²)

where σ² is kernel_variance and ℓ is lengthscale.

covariance(xa, xb, include_noise=True)[source]

Compute RBF covariance matrix.

Parameters:
  • xa (jnp.ndarray) – First set of input points, shape (n1, d).

  • xb (jnp.ndarray) – Second set of input points, shape (n2, d).

  • include_noise (bool, optional) – Whether to include noise on diagonal. Default is True.

Returns:

Kernel matrix of shape (n1, n2).

Return type:

jnp.ndarray

class BOBE.kernels.MaternKernel(lengthscales, kernel_variance, noise=1e-08)[source]

Bases: Kernel

Matérn-5/2 kernel.

k(x, x’) = σ² * (1 + √5*d + 5*d²/3) * exp(-√5*d)

where d = ||x - x’||/ℓ, σ² is kernel_variance, and ℓ is lengthscale.

covariance(xa, xb, include_noise=True)[source]

Compute Matérn-5/2 covariance matrix.

Parameters:
  • xa (jnp.ndarray) – First set of input points, shape (n1, d).

  • xb (jnp.ndarray) – Second set of input points, shape (n2, d).

  • include_noise (bool, optional) – Whether to include noise on diagonal. Default is True.

Returns:

Kernel matrix of shape (n1, n2).

Return type:

jnp.ndarray

Classifier Module

BOBE.clf.train_svm_classifier(X, Y, settings={}, init_params=None, **kwargs)[source]

Train SVM classifier and return parameters, metrics, and predict function.

BOBE.clf.get_svm_predict_proba_fn(params)[source]

Get prediction function for SVM classifier from parameters (for loading from file).

BOBE.clf.train_nn_classifier(X, Y, settings={}, init_params=None, **kwargs)[source]

Train neural network classifier and return parameters, metrics, and predict function.

BOBE.clf.get_nn_predict_proba_fn(params, settings={}, **kwargs)[source]

Get prediction function for NN classifier from parameters (for loading from file).

BOBE.clf.train_ellipsoid_classifier(X, Y, settings={}, init_params=None, **kwargs)[source]

Train ellipsoid classifier and return parameters, metrics, and predict function.

BOBE.clf.get_ellipsoid_predict_proba_fn(params, settings, d, **kwargs)[source]

Get prediction function for ellipsoid classifier from parameters (for loading from file).

BOBE.clf.svm_predict(x, support_vectors, dual_coef, intercept, gamma)[source]

Compute the decision function for SVM with RBF kernel.

Parameters:
  • x (Array) – Input data point, shape (n_features,)

  • support_vectors (Array) – JAX array of support vectors, shape (n_sv, n_features)

  • dual_coef (Array) – JAX array of dual coefficients, shape (n_sv,)

  • intercept (float) – Scalar bias term.

  • gamma (float) – RBF kernel gamma parameter.

Returns:

Decision function value (scalar). Sign of this value gives the predicted class.

BOBE.clf.svm_predict_proba(x, support_vectors, dual_coef, intercept, gamma)[source]
BOBE.clf.train_with_restarts(train_fn, x, y, n_restarts=2, init_params=None, **train_kwargs)[source]

Train model with multiple restarts using the entire dataset.

Parameters:
  • train_fn (Callable) – Training function that returns (params, metrics)

  • x (Array) – (N, d) features

  • y (Array) – (N,) labels

  • n_restarts (int) – number of random restarts

  • init_params – initial parameters for first restart

  • **train_kwargs – passed to train_fn

Return type:

Tuple[Dict, Dict]

BOBE.clf.train_nn(model, x_train, y_train, init_params=None, **kwargs)[source]

Simplified NN training using entire dataset

BOBE.clf.train_nn_multiple_restarts(model, x, y, **kwargs)[source]

Wrapper for NN training with restarts

BOBE.clf.train_ellipsoid(model, x_train, y_train, init_params=None, **kwargs)[source]

Simplified ellipsoid training using entire dataset

BOBE.clf.train_ellipsoid_multiple_restarts(model, x, y, **kwargs)[source]

Wrapper for ellipsoid training with restarts