codes.benchmark package

Contents

codes.benchmark package#

Submodules#

codes.benchmark.bench_fcts module#

codes.benchmark.bench_fcts.compare_MAE(metrics, config)#

Compare the MAE of different surrogate models over the course of training.

Parameters:
  • metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.

  • config (dict) – Configuration dictionary.

Return type:

None

Returns:

None

codes.benchmark.bench_fcts.compare_UQ(all_metrics, config)#

Compare the uncertainty quantification (UQ) metrics of different surrogate models.

Parameters:
  • all_metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.

  • config (dict) – Configuration dictionary.

Return type:

None

Returns:

None

codes.benchmark.bench_fcts.compare_batchsize(all_metrics, config)#

Compare the batch size training errors of different surrogate models.

Parameters:
  • all_metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.

  • config (dict) – Configuration dictionary.

Return type:

None

Returns:

None

codes.benchmark.bench_fcts.compare_dynamic_accuracy(metrics, config)#

Compare the gradients of different surrogate models.

Parameters:
  • metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.

  • config (dict) – Configuration dictionary.

Return type:

None

Returns:

None

codes.benchmark.bench_fcts.compare_extrapolation(all_metrics, config)#

Compare the extrapolation errors of different surrogate models.

Parameters:
  • all_metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.

  • config (dict) – Configuration dictionary.

Return type:

None

Returns:

None

codes.benchmark.bench_fcts.compare_inference_time(metrics, config, save=True)#

Compare the mean inference time of different surrogate models.

Parameters:
  • metrics (dict[str, dict]) – dictionary containing the benchmark metrics for each surrogate model.

  • config (dict) – Configuration dictionary.

  • save (bool, optional) – Whether to save the plot. Defaults to True.

Return type:

None

Returns:

None

codes.benchmark.bench_fcts.compare_interpolation(all_metrics, config)#

Compare the interpolation errors of different surrogate models.

Parameters:
  • all_metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.

  • config (dict) – Configuration dictionary.

Return type:

None

Returns:

None

codes.benchmark.bench_fcts.compare_main_losses(metrics, config)#

Compare the training and test losses of the main models for different surrogate models.

Parameters:
  • metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.

  • config (dict) – Configuration dictionary.

Return type:

None

Returns:

None

codes.benchmark.bench_fcts.compare_models(metrics, config)#
codes.benchmark.bench_fcts.compare_relative_errors(metrics, config)#

Compare the relative errors over time for different surrogate models.

Parameters:
  • metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.

  • config (dict) – Configuration dictionary.

Return type:

None

Returns:

None

codes.benchmark.bench_fcts.compare_sparse(all_metrics, config)#

Compare the sparse training errors of different surrogate models.

Parameters:
  • all_metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.

  • config (dict) – Configuration dictionary.

Return type:

None

Returns:

None

codes.benchmark.bench_fcts.evaluate_UQ(model, surr_name, test_loader, timesteps, conf, labels=None)#

Evaluate the uncertainty quantification (UQ) performance of the surrogate model.

Parameters:
  • model – Instance of the surrogate model class.

  • surr_name (str) – The name of the surrogate model.

  • test_loader (DataLoader) – The DataLoader object containing the test data.

  • timesteps (np.ndarray) – The timesteps array.

  • conf (dict) – The configuration dictionary.

  • labels (list, optional) – The labels for the chemical species.

Returns:

A dictionary containing UQ metrics.

Return type:

dict

codes.benchmark.bench_fcts.evaluate_accuracy(model, surr_name, test_loader, conf, labels=None)#

Evaluate the accuracy of the surrogate model.

Parameters:
  • model – Instance of the surrogate model class.

  • surr_name (str) – The name of the surrogate model.

  • test_loader (DataLoader) – The DataLoader object containing the test data.

  • conf (dict) – The configuration dictionary.

  • labels (list, optional) – The labels for the chemical species.

Returns:

A dictionary containing accuracy metrics.

Return type:

dict

codes.benchmark.bench_fcts.evaluate_batchsize(model, surr_name, test_loader, timesteps, conf)#

Evaluate the performance of the surrogate model with different batch sizes.

Parameters:
  • model – Instance of the surrogate model class.

  • surr_name (str) – The name of the surrogate model.

  • test_loader (DataLoader) – The DataLoader object containing the test data.

  • timesteps (np.ndarray) – The timesteps array.

  • conf (dict) – The configuration dictionary.

Returns:

A dictionary containing batch size training metrics.

Return type:

dict

codes.benchmark.bench_fcts.evaluate_compute(model, surr_name, test_loader, conf)#

Evaluate the computational resource requirements of the surrogate model.

Parameters:
  • model – Instance of the surrogate model class.

  • surr_name (str) – The name of the surrogate model.

  • test_loader (DataLoader) – The DataLoader object containing the test data.

  • conf (dict) – The configuration dictionary.

Returns:

A dictionary containing model complexity metrics.

Return type:

dict

codes.benchmark.bench_fcts.evaluate_dynamic_accuracy(model, surr_name, test_loader, conf, species_names=None)#

Evaluate the gradients of the surrogate model.

Parameters:
  • model – Instance of the surrogate model class.

  • surr_name (str) – The name of the surrogate model.

  • test_loader (DataLoader) – The DataLoader object containing the test data.

  • conf (dict) – The configuration dictionary.

Returns:

A dictionary containing gradients metrics.

Return type:

dict

codes.benchmark.bench_fcts.evaluate_extrapolation(model, surr_name, test_loader, timesteps, conf)#

Evaluate the extrapolation performance of the surrogate model.

Parameters:
  • model – Instance of the surrogate model class.

  • surr_name (str) – The name of the surrogate model.

  • test_loader (DataLoader) – The DataLoader object containing the test data.

  • timesteps (np.ndarray) – The timesteps array.

  • conf (dict) – The configuration dictionary.

Returns:

A dictionary containing extrapolation metrics.

Return type:

dict

codes.benchmark.bench_fcts.evaluate_interpolation(model, surr_name, test_loader, timesteps, conf)#

Evaluate the interpolation performance of the surrogate model.

Parameters:
  • model – Instance of the surrogate model class.

  • surr_name (str) – The name of the surrogate model.

  • test_loader (DataLoader) – The DataLoader object containing the test data.

  • timesteps (np.ndarray) – The timesteps array.

  • conf (dict) – The configuration dictionary.

Returns:

A dictionary containing interpolation metrics.

Return type:

dict

codes.benchmark.bench_fcts.evaluate_sparse(model, surr_name, test_loader, timesteps, n_train_samples, conf)#

Evaluate the performance of the surrogate model with sparse training data.

Parameters:
  • model – Instance of the surrogate model class.

  • surr_name (str) – The name of the surrogate model.

  • test_loader (DataLoader) – The DataLoader object containing the test data.

  • n_train_samples (int) – The number of training samples in the full dataset.

  • conf (dict) – The configuration dictionary.

Returns:

A dictionary containing sparse training metrics.

Return type:

dict

codes.benchmark.bench_fcts.run_benchmark(surr_name, surrogate_class, conf)#

Run benchmarks for a given surrogate model.

Parameters:
  • surr_name (str) – The name of the surrogate model to benchmark.

  • surrogate_class – The class of the surrogate model.

  • conf (dict) – The configuration dictionary.

Returns:

A dictionary containing all relevant metrics for the given model.

Return type:

dict

codes.benchmark.bench_fcts.tabular_comparison(all_metrics, config)#

Compare the metrics of different surrogate models in a tabular format.

Parameters:
  • all_metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.

  • config (dict) – Configuration dictionary.

Return type:

None

Returns:

None

codes.benchmark.bench_fcts.time_inference(model, surr_name, test_loader, conf, n_test_samples, n_runs=5)#

Time the inference of the surrogate model.

Parameters:
  • model – Instance of the surrogate model class.

  • surr_name (str) – The name of the surrogate model.

  • test_loader (DataLoader) – The DataLoader object containing the test data.

  • timesteps (np.ndarray) – The timesteps array.

  • conf (dict) – The configuration dictionary.

  • n_test_samples (int) – The number of test samples.

  • n_runs (int, optional) – Number of times to run the inference for timing.

Returns:

A dictionary containing timing metrics.

Return type:

dict

codes.benchmark.bench_plots module#

codes.benchmark.bench_plots.get_custom_palette(num_colors)#

Returns a list of colors sampled from a custom color palette.

Parameters:

num_colors (int) – The number of colors needed.

Returns:

A list of RGBA color tuples.

Return type:

list

codes.benchmark.bench_plots.inference_time_bar_plot(surrogates, means, stds, config, save=True)#

Plot the mean inference time with standard deviation for different surrogate models.

Parameters:
  • surrogates (List[str]) – List of surrogate model names.

  • means (List[float]) – List of mean inference times for each surrogate model.

  • stds (List[float]) – List of standard deviation of inference times for each surrogate model.

  • config (dict) – Configuration dictionary.

  • save (bool, optional) – Whether to save the plot. Defaults to True.

Return type:

None

Returns:

None

codes.benchmark.bench_plots.int_ext_sparse(all_metrics, config)#

Function to make one comparative plot of the interpolation, extrapolation, and sparse training errors.

Parameters:
  • all_metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.

  • config (dict) – Configuration dictionary.

Return type:

None

Returns:

None

codes.benchmark.bench_plots.plot_MAE_comparison(MAEs, labels, config, save=True)#

Plot the MAE for different surrogate models.

Parameters:
  • MAE (tuple) – Tuple of accuracy arrays for each surrogate model.

  • labels (tuple) – Tuple of labels for each surrogate model.

  • config (dict) – Configuration dictionary.

  • save (bool) – Whether to save the plot.

Return type:

None

codes.benchmark.bench_plots.plot_MAE_comparison_train_duration(MAEs, labels, train_durations, config, save=True)#

Plot the MAE for different surrogate models.

Parameters:
  • MAE (tuple) – Tuple of accuracy arrays for each surrogate model.

  • labels (tuple) – Tuple of labels for each surrogate model.

  • config (dict) – Configuration dictionary.

  • save (bool) – Whether to save the plot.

Return type:

None

codes.benchmark.bench_plots.plot_average_errors_over_time(surr_name, conf, errors, metrics, timesteps, mode, save=False)#

Plot the errors over time for different modes (interpolation, extrapolation, sparse, batchsize).

Parameters:
  • surr_name (str) – The name of the surrogate model.

  • conf (dict) – The configuration dictionary.

  • errors (np.ndarray) – Errors array of shape [N_metrics, n_timesteps].

  • metrics (np.ndarray) – Metrics array of shape [N_metrics].

  • timesteps (np.ndarray) – Timesteps array.

  • mode (str) – The mode of evaluation (‘interpolation’, ‘extrapolation’, ‘sparse’, ‘batchsize’).

  • save (bool, optional) – Whether to save the plot as a file.

Return type:

None

codes.benchmark.bench_plots.plot_average_uncertainty_over_time(surr_name, conf, errors_time, preds_std, timesteps, save=False)#

Plot the average uncertainty over time.

Parameters:
  • surr_name (str) – The name of the surrogate model.

  • conf (dict) – The configuration dictionary.

  • errors_time (np.ndarray) – Prediction errors over time.

  • preds_std (np.ndarray) – Standard deviation over time of predictions from the ensemble of models.

  • timesteps (np.ndarray) – Timesteps array.

  • save (bool, optional) – Whether to save the plot as a file.

Return type:

None

codes.benchmark.bench_plots.plot_comparative_dynamic_correlation_heatmaps(gradients, errors, avg_correlations, max_grad, max_err, max_count, config, save=True)#

Plot comparative heatmaps of correlation between gradient and prediction errors for multiple surrogate models.

Parameters:
  • gradients (dict[str, np.ndarray]) – Dictionary of gradients from the ensemble of models.

  • errors (dict[str, np.ndarray]) – Dictionary of prediction errors.

  • avg_correlations (dict[str, float]) – Dictionary of average correlations between gradients and prediction errors.

  • max_grad (dict[str, float]) – Dictionary of maximum gradient values for axis scaling across models.

  • max_err (dict[str, float]) – Dictionary of maximum error values for axis scaling across models.

  • max_count (dict[str, float]) – Dictionary of maximum count values for heatmap normalization across models.

  • config (dict) – Configuration dictionary.

  • save (bool, optional) – Whether to save the plot. Defaults to True.

Return type:

None

Returns:

None

codes.benchmark.bench_plots.plot_comparative_error_correlation_heatmaps(preds_std, errors, avg_correlations, axis_max, max_count, config, save=True)#

Plot comparative heatmaps of correlation between predictive uncertainty and prediction errors for multiple surrogate models.

Parameters:
  • preds_std (dict[str, np.ndarray]) – Dictionary of standard deviation of predictions from the ensemble of models.

  • errors (dict[str, np.ndarray]) – Dictionary of prediction errors.

  • avg_correlations (dict[str, float]) – Dictionary of average correlations between gradients and prediction errors.

  • axis_max (dict[str, float]) – Dictionary of maximum values for axis scaling across models.

  • max_count (dict[str, float]) – Dictionary of maximum count values for heatmap normalization across models.

  • config (dict) – Configuration dictionary.

  • save (bool, optional) – Whether to save the plot. Defaults to True.

Return type:

None

Returns:

None

codes.benchmark.bench_plots.plot_dynamic_correlation(surr_name, conf, gradients, errors, save=False)#

Plot the correlation between the gradients of the data and the prediction errors.

Parameters:
  • surr_name (str) – The name of the surrogate model.

  • conf (dict) – The configuration dictionary.

  • gradients (np.ndarray) – The gradients of the data.

  • errors (np.ndarray) – The prediction errors.

  • save (bool) – Whether to save the plot.

codes.benchmark.bench_plots.plot_dynamic_correlation_heatmap(surr_name, conf, preds_std, errors, average_correlation, save=False, threshold_factor=0.0001, xcut_percent=0.003)#

Plot the correlation between predictive uncertainty and prediction errors using a heatmap.

Parameters:
  • surr_name (str) – The name of the surrogate model.

  • conf (dict) – The configuration dictionary.

  • preds_std (np.ndarray) – Standard deviation of predictions from the ensemble of models.

  • errors (np.ndarray) – Prediction errors.

  • average_correlation (float) – The average correlation between gradients and prediction errors (pearson correlation).

  • save (bool, optional) – Whether to save the plot as a file.

  • threshold_factor (float, optional) – Fraction of max value below which cells are set to 0. Default is 5e-5.

  • cutoff_percent (float, optional) – The percentage of total counts to include in the heatmap. Default is 0.95.

Return type:

None

codes.benchmark.bench_plots.plot_error_correlation_heatmap(surr_name, conf, preds_std, errors, average_correlation, save=False, threshold_factor=0.01)#

Plot the correlation between predictive uncertainty and prediction errors using a heatmap.

Parameters:
  • surr_name (str) – The name of the surrogate model.

  • conf (dict) – The configuration dictionary.

  • preds_std (np.ndarray) – Standard deviation of predictions from the ensemble of models.

  • errors (np.ndarray) – Prediction errors.

  • average_correlation (float) – The average correlation between gradients and prediction errors (pearson correlation).

  • save (bool, optional) – Whether to save the plot as a file.

  • threshold_factor (float, optional) – Fraction of max value below which cells are set to 0. Default is 0.001.

Return type:

None

codes.benchmark.bench_plots.plot_error_distribution_comparative(errors, conf, save=True)#

Plot the comparative distribution of errors for each surrogate model as a smoothed histogram plot.

Parameters:
  • conf (dict) – The configuration dictionary.

  • errors (dict) – Dictionary containing numpy arrays of shape [num_samples, num_timesteps, num_chemicals] for each model.

  • save (bool, optional) – Whether to save the plot as a file.

Return type:

None

codes.benchmark.bench_plots.plot_error_distribution_per_chemical(surr_name, conf, errors, chemical_names=None, num_chemicals=10, save=True)#

Plot the distribution of errors for each chemical as a smoothed histogram plot.

Parameters:
  • surr_name (str) – The name of the surrogate model.

  • conf (dict) – The configuration dictionary.

  • errors (np.ndarray) – Errors array of shape [num_samples, num_timesteps, num_chemicals].

  • chemical_names (list, optional) – List of chemical names for labeling the lines.

  • num_chemicals (int, optional) – Number of chemicals to plot. Default is 10.

  • save (bool, optional) – Whether to save the plot as a file.

Return type:

None

codes.benchmark.bench_plots.plot_example_predictions_with_uncertainty(surr_name, conf, preds_mean, preds_std, targets, timesteps, example_idx=0, num_chemicals=100, labels=None, save=False)#

Plot example predictions with uncertainty.

Parameters:
  • surr_name (str) – The name of the surrogate model.

  • conf (dict) – The configuration dictionary.

  • preds_mean (np.ndarray) – Mean predictions from the ensemble of models.

  • preds_std (np.ndarray) – Standard deviation of predictions from the ensemble of models.

  • targets (np.ndarray) – True targets.

  • timesteps (np.ndarray) – Timesteps array.

  • example_idx (int, optional) – Index of the example to plot. Default is 0.

  • num_chemicals (int, optional) – Number of chemicals to plot. Default is 100.

  • labels (list, optional) – List of labels for the chemicals.

  • save (bool, optional) – Whether to save the plot as a file.

Return type:

None

codes.benchmark.bench_plots.plot_generalization_error_comparison(surrogates, metrics_list, model_errors_list, xlabel, filename, config, save=True, xlog=False)#

Plot the generalization errors of different surrogate models.

Parameters:
  • surrogates (list) – List of surrogate model names.

  • metrics_list (list[np.array]) – List of numpy arrays containing the metrics for each surrogate model.

  • model_errors_list (list[np.array]) – List of numpy arrays containing the errors for each surrogate model.

  • xlabel (str) – Label for the x-axis.

  • filename (str) – Filename to save the plot.

  • config (dict) – Configuration dictionary.

  • save (bool) – Whether to save the plot.

  • xlog (bool) – Whether to use a log scale for the x-axis.

Return type:

None

Returns:

None

codes.benchmark.bench_plots.plot_generalization_errors(surr_name, conf, metrics, model_errors, mode, save=False)#

Plot the generalization errors of a model for various metrics.

Parameters:
  • surr_name (str) – The name of the surrogate model.

  • conf (dict) – The configuration dictionary.

  • metrics (np.ndarray) – The metrics (e.g., intervals, cutoffs, batch sizes, number of training samples).

  • model_errors (np.ndarray) – The model errors.

  • mode (str) – The mode of generalization (“interpolation”, “extrapolation”, “sparse”, “batchsize”).

  • save (bool) – Whether to save the plot.

Return type:

None

Returns:

None

codes.benchmark.bench_plots.plot_loss_comparison(train_losses, test_losses, labels, config, save=True)#

Plot the training and test losses for different surrogate models.

Parameters:
  • train_losses (tuple) – Tuple of training loss arrays for each surrogate model.

  • test_losses (tuple) – Tuple of test loss arrays for each surrogate model.

  • labels (tuple) – Tuple of labels for each surrogate model.

  • config (dict) – Configuration dictionary.

  • save (bool) – Whether to save the plot.

Return type:

None

Returns:

None

codes.benchmark.bench_plots.plot_losses(loss_histories, labels, title='Losses', save=False, conf=None, surr_name=None, mode='main')#

Plot the loss trajectories for the training of multiple models.

Parameters:
  • loss_histories (tuple[array, ...]) – List of loss history arrays.

  • labels (tuple[str, ...]) – List of labels for each loss history.

  • title (str) – Title of the plot.

  • save (bool) – Whether to save the plot as an image file.

  • conf (Optional[dict]) – The configuration dictionary.

  • surr_name (Optional[str]) – The name of the surrogate model.

  • mode (str) – The mode of the training.

Return type:

None

codes.benchmark.bench_plots.plot_relative_errors(mean_errors, median_errors, timesteps, config, save=True)#

Plot the relative errors over time for different surrogate models.

Parameters:
  • mean_errors (dict) – dictionary containing the mean relative errors for each surrogate model.

  • median_errors (dict) – dictionary containing the median relative errors for each surrogate model.

  • timesteps (np.ndarray) – Array of timesteps.

  • config (dict) – Configuration dictionary.

  • save (bool) – Whether to save the plot.

Return type:

None

Returns:

None

codes.benchmark.bench_plots.plot_relative_errors_over_time(surr_name, conf, relative_errors, title, save=False)#

Plot the mean and median relative errors over time with shaded regions for the 50th, 90th, and 99th percentiles.

Parameters:
  • surr_name (str) – The name of the surrogate model.

  • conf (dict) – The configuration dictionary.

  • relative_errors (np.ndarray) – The relative errors of the model.

  • title (str) – The title of the plot.

  • save (bool) – Whether to save the plot.

Return type:

None

codes.benchmark.bench_plots.plot_surr_losses(model, surr_name, conf, timesteps)#

Plot the training and test losses for the surrogate model.

Parameters:
  • model – Instance of the surrogate model class.

  • surr_name (str) – The name of the surrogate model.

  • conf (dict) – The configuration dictionary.

  • timesteps (np.ndarray) – The timesteps array.

Return type:

None

codes.benchmark.bench_plots.plot_uncertainty_over_time_comparison(uncertainties, absolute_errors, timesteps, config, save=True)#

Plot the uncertainty over time for different surrogate models.

Parameters:
  • uncertainties (dict) – Dictionary containing the uncertainties for each surrogate model.

  • absolute_errors (dict) – Dictionary containing the absolute errors for each surrogate model.

  • timesteps (np.ndarray) – Array of timesteps.

  • config (dict) – Configuration dictionary.

  • save (bool) – Whether to save the plot.

Return type:

None

Returns:

None

codes.benchmark.bench_plots.plot_uncertainty_vs_errors(surr_name, conf, preds_std, errors, save=False)#

Plot the correlation between predictive uncertainty and prediction errors.

Parameters:
  • surr_name (str) – The name of the surrogate model.

  • conf (dict) – The configuration dictionary.

  • preds_std (np.ndarray) – Standard deviation of predictions from the ensemble of models.

  • errors (np.ndarray) – Prediction errors.

  • save (bool, optional) – Whether to save the plot as a file.

Return type:

None

codes.benchmark.bench_plots.rel_errors_and_uq(metrics, config, save=True)#

Create a figure with two subplots: relative errors over time and uncertainty over time for different surrogate models.

Parameters:
  • metrics (dict) – Dictionary containing the benchmark metrics for each surrogate model.

  • config (dict) – Configuration dictionary.

  • save (bool) – Whether to save the plot.

Return type:

None

Returns:

None

codes.benchmark.bench_plots.save_plot(plt, filename, conf, surr_name='', dpi=300, base_dir='plots', increase_count=False)#

Save the plot to a file, creating necessary directories if they don’t exist.

Parameters:
  • plt (matplotlib.pyplot) – The plot object to save.

  • filename (str) – The desired filename for the plot.

  • conf (dict) – The configuration dictionary.

  • surr_name (str) – The name of the surrogate model.

  • dpi (int) – The resolution of the saved plot.

  • base_dir (str, optional) – The base directory where plots will be saved. Default is “plots”.

  • increase_count (bool, optional) – Whether to increment the filename count if a file already exists. Default is True.

Raises:

ValueError – If the configuration dictionary does not contain the required keys.

Return type:

None

codes.benchmark.bench_plots.save_plot_counter(filename, directory, increase_count=True)#

Save a plot with an incremented filename if a file with the same name already exists.

Parameters:
  • filename (str) – The desired filename for the plot.

  • directory (str) – The directory to save the plot in.

  • increase_count (bool, optional) – Whether to increment the filename count if a file already exists. Default is True.

Returns:

The full path to the saved plot.

Return type:

str

codes.benchmark.bench_utils module#

codes.benchmark.bench_utils.check_benchmark(conf)#

Check whether there are any configuration issues with the benchmark.

Parameters:

conf (dict) – The configuration dictionary.

Raises:
  • FileNotFoundError – If the training ID directory is missing or if the .yaml file is missing.

  • ValueError – If the configuration is missing required keys or the values do not match the training configuration.

Return type:

None

codes.benchmark.bench_utils.check_surrogate(surrogate, conf)#

Check whether the required models for the benchmark are present in the expected directories.

Parameters:
  • surrogate (str) – The name of the surrogate model to check.

  • conf (dict) – The configuration dictionary.

Raises:

FileNotFoundError – If any required models are missing.

Return type:

None

codes.benchmark.bench_utils.clean_metrics(metrics, conf)#

Clean the metrics dictionary to remove problematic entries.

Parameters:
  • metrics (dict) – The benchmark metrics.

  • conf (dict) – The configuration dictionary.

Returns:

The cleaned metrics dictionary.

Return type:

dict

codes.benchmark.bench_utils.convert_dict_to_scientific_notation(d, precision=8)#

Convert all numerical values in a dictionary to scientific notation.

Parameters:

d (dict) – The input dictionary.

Returns:

The dictionary with numerical values in scientific notation.

Return type:

dict

codes.benchmark.bench_utils.convert_to_standard_types(data)#

Recursively convert data to standard types that can be serialized to YAML.

Parameters:

data – The data to convert.

Returns:

The converted data.

codes.benchmark.bench_utils.count_trainable_parameters(model)#

Count the number of trainable parameters in the model.

Parameters:

model (torch.nn.Module) – The PyTorch model.

Returns:

The number of trainable parameters.

Return type:

int

codes.benchmark.bench_utils.discard_numpy_entries(d)#

Recursively remove dictionary entries that contain NumPy arrays.

Parameters:

d (dict) – The input dictionary.

Returns:

A new dictionary without entries containing NumPy arrays.

Return type:

dict

codes.benchmark.bench_utils.flatten_dict(d, parent_key='', sep=' - ')#

Flatten a nested dictionary.

Parameters:
  • d (dict) – The dictionary to flatten.

  • parent_key (str) – The base key string.

  • sep (str) – The separator between keys.

Returns:

Flattened dictionary with composite keys.

Return type:

dict

codes.benchmark.bench_utils.format_seconds(seconds)#

Format a duration given in seconds as hh:mm:ss.

Parameters:

seconds (int) – The duration in seconds.

Returns:

The formatted duration string.

Return type:

str

codes.benchmark.bench_utils.format_time(mean_time, std_time)#

Format mean and std time consistently in ns, µs, ms, or s.

Parameters:
  • mean_time – The mean time.

  • std_time – The standard deviation of the time.

Returns:

The formatted time string.

Return type:

str

codes.benchmark.bench_utils.get_model_config(surr_name, config)#

Get the model configuration for a specific surrogate model from the dataset folder. Returns an empty dictionary if the configuration file is not found.

Parameters:
  • surr_name (str) – The name of the surrogate model.

  • config (dict) – The configuration dictionary.

Returns:

The model configuration dictionary.

Return type:

dict

codes.benchmark.bench_utils.get_required_models_list(surrogate, conf)#

Generate a list of required models based on the configuration settings.

Parameters:
  • surrogate (str) – The name of the surrogate model.

  • conf (dict) – The configuration dictionary.

Returns:

A list of required model names.

Return type:

list

codes.benchmark.bench_utils.get_surrogate(surrogate_name)#

Check if the surrogate model exists.

Parameters:

surrogate_name (str) – The name of the surrogate model.

Returns:

The surrogate model class if it exists, otherwise None.

Return type:

SurrogateModel | None

codes.benchmark.bench_utils.load_model(model, training_id, surr_name, model_identifier)#

Load a trained surrogate model.

Parameters:
  • model – Instance of the surrogate model class.

  • training_id (str) – The training identifier.

  • surr_name (str) – The name of the surrogate model.

  • model_identifier (str) – The identifier of the model (e.g., ‘main’).

Return type:

Module

Returns:

The loaded surrogate model.

codes.benchmark.bench_utils.make_comparison_csv(metrics, config)#

Generate a CSV file comparing metrics for different surrogate models.

Parameters:
  • metrics (dict) – Dictionary containing the benchmark metrics for each surrogate model.

  • config (dict) – Configuration dictionary.

Return type:

None

Returns:

None

codes.benchmark.bench_utils.measure_memory_footprint(model, inputs)#

Measure the memory footprint of the model during the forward and backward pass.

Parameters:
  • model (torch.nn.Module) – The PyTorch model.

  • inputs (tuple) – The input data for the model.

  • conf (dict) – The configuration dictionary.

  • surr_name (str) – The name of the surrogate model.

Returns:

A dictionary containing memory footprint measurements.

Return type:

dict

codes.benchmark.bench_utils.read_yaml_config(config_path)#

Read the YAML configuration file.

Parameters:

config_path (str) – Path to the YAML configuration file.

Returns:

The configuration dictionary.

Return type:

dict

codes.benchmark.bench_utils.write_metrics_to_yaml(surr_name, conf, metrics)#

Write the benchmark metrics to a YAML file.

Parameters:
  • surr_name (str) – The name of the surrogate model.

  • conf (dict) – The configuration dictionary.

  • metrics (dict) – The benchmark metrics.

Return type:

None

Module contents#

codes.benchmark.check_benchmark(conf)#

Check whether there are any configuration issues with the benchmark.

Parameters:

conf (dict) – The configuration dictionary.

Raises:
  • FileNotFoundError – If the training ID directory is missing or if the .yaml file is missing.

  • ValueError – If the configuration is missing required keys or the values do not match the training configuration.

Return type:

None

codes.benchmark.check_surrogate(surrogate, conf)#

Check whether the required models for the benchmark are present in the expected directories.

Parameters:
  • surrogate (str) – The name of the surrogate model to check.

  • conf (dict) – The configuration dictionary.

Raises:

FileNotFoundError – If any required models are missing.

Return type:

None

codes.benchmark.clean_metrics(metrics, conf)#

Clean the metrics dictionary to remove problematic entries.

Parameters:
  • metrics (dict) – The benchmark metrics.

  • conf (dict) – The configuration dictionary.

Returns:

The cleaned metrics dictionary.

Return type:

dict

codes.benchmark.compare_MAE(metrics, config)#

Compare the MAE of different surrogate models over the course of training.

Parameters:
  • metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.

  • config (dict) – Configuration dictionary.

Return type:

None

Returns:

None

codes.benchmark.compare_UQ(all_metrics, config)#

Compare the uncertainty quantification (UQ) metrics of different surrogate models.

Parameters:
  • all_metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.

  • config (dict) – Configuration dictionary.

Return type:

None

Returns:

None

codes.benchmark.compare_batchsize(all_metrics, config)#

Compare the batch size training errors of different surrogate models.

Parameters:
  • all_metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.

  • config (dict) – Configuration dictionary.

Return type:

None

Returns:

None

codes.benchmark.compare_dynamic_accuracy(metrics, config)#

Compare the gradients of different surrogate models.

Parameters:
  • metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.

  • config (dict) – Configuration dictionary.

Return type:

None

Returns:

None

codes.benchmark.compare_extrapolation(all_metrics, config)#

Compare the extrapolation errors of different surrogate models.

Parameters:
  • all_metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.

  • config (dict) – Configuration dictionary.

Return type:

None

Returns:

None

codes.benchmark.compare_inference_time(metrics, config, save=True)#

Compare the mean inference time of different surrogate models.

Parameters:
  • metrics (dict[str, dict]) – dictionary containing the benchmark metrics for each surrogate model.

  • config (dict) – Configuration dictionary.

  • save (bool, optional) – Whether to save the plot. Defaults to True.

Return type:

None

Returns:

None

codes.benchmark.compare_interpolation(all_metrics, config)#

Compare the interpolation errors of different surrogate models.

Parameters:
  • all_metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.

  • config (dict) – Configuration dictionary.

Return type:

None

Returns:

None

codes.benchmark.compare_main_losses(metrics, config)#

Compare the training and test losses of the main models for different surrogate models.

Parameters:
  • metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.

  • config (dict) – Configuration dictionary.

Return type:

None

Returns:

None

codes.benchmark.compare_models(metrics, config)#
codes.benchmark.compare_relative_errors(metrics, config)#

Compare the relative errors over time for different surrogate models.

Parameters:
  • metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.

  • config (dict) – Configuration dictionary.

Return type:

None

Returns:

None

codes.benchmark.compare_sparse(all_metrics, config)#

Compare the sparse training errors of different surrogate models.

Parameters:
  • all_metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.

  • config (dict) – Configuration dictionary.

Return type:

None

Returns:

None

codes.benchmark.convert_dict_to_scientific_notation(d, precision=8)#

Convert all numerical values in a dictionary to scientific notation.

Parameters:

d (dict) – The input dictionary.

Returns:

The dictionary with numerical values in scientific notation.

Return type:

dict

codes.benchmark.convert_to_standard_types(data)#

Recursively convert data to standard types that can be serialized to YAML.

Parameters:

data – The data to convert.

Returns:

The converted data.

codes.benchmark.count_trainable_parameters(model)#

Count the number of trainable parameters in the model.

Parameters:

model (torch.nn.Module) – The PyTorch model.

Returns:

The number of trainable parameters.

Return type:

int

codes.benchmark.discard_numpy_entries(d)#

Recursively remove dictionary entries that contain NumPy arrays.

Parameters:

d (dict) – The input dictionary.

Returns:

A new dictionary without entries containing NumPy arrays.

Return type:

dict

codes.benchmark.evaluate_UQ(model, surr_name, test_loader, timesteps, conf, labels=None)#

Evaluate the uncertainty quantification (UQ) performance of the surrogate model.

Parameters:
  • model – Instance of the surrogate model class.

  • surr_name (str) – The name of the surrogate model.

  • test_loader (DataLoader) – The DataLoader object containing the test data.

  • timesteps (np.ndarray) – The timesteps array.

  • conf (dict) – The configuration dictionary.

  • labels (list, optional) – The labels for the chemical species.

Returns:

A dictionary containing UQ metrics.

Return type:

dict

codes.benchmark.evaluate_accuracy(model, surr_name, test_loader, conf, labels=None)#

Evaluate the accuracy of the surrogate model.

Parameters:
  • model – Instance of the surrogate model class.

  • surr_name (str) – The name of the surrogate model.

  • test_loader (DataLoader) – The DataLoader object containing the test data.

  • conf (dict) – The configuration dictionary.

  • labels (list, optional) – The labels for the chemical species.

Returns:

A dictionary containing accuracy metrics.

Return type:

dict

codes.benchmark.evaluate_batchsize(model, surr_name, test_loader, timesteps, conf)#

Evaluate the performance of the surrogate model with different batch sizes.

Parameters:
  • model – Instance of the surrogate model class.

  • surr_name (str) – The name of the surrogate model.

  • test_loader (DataLoader) – The DataLoader object containing the test data.

  • timesteps (np.ndarray) – The timesteps array.

  • conf (dict) – The configuration dictionary.

Returns:

A dictionary containing batch size training metrics.

Return type:

dict

codes.benchmark.evaluate_compute(model, surr_name, test_loader, conf)#

Evaluate the computational resource requirements of the surrogate model.

Parameters:
  • model – Instance of the surrogate model class.

  • surr_name (str) – The name of the surrogate model.

  • test_loader (DataLoader) – The DataLoader object containing the test data.

  • conf (dict) – The configuration dictionary.

Returns:

A dictionary containing model complexity metrics.

Return type:

dict

codes.benchmark.evaluate_dynamic_accuracy(model, surr_name, test_loader, conf, species_names=None)#

Evaluate the gradients of the surrogate model.

Parameters:
  • model – Instance of the surrogate model class.

  • surr_name (str) – The name of the surrogate model.

  • test_loader (DataLoader) – The DataLoader object containing the test data.

  • conf (dict) – The configuration dictionary.

Returns:

A dictionary containing gradients metrics.

Return type:

dict

codes.benchmark.evaluate_extrapolation(model, surr_name, test_loader, timesteps, conf)#

Evaluate the extrapolation performance of the surrogate model.

Parameters:
  • model – Instance of the surrogate model class.

  • surr_name (str) – The name of the surrogate model.

  • test_loader (DataLoader) – The DataLoader object containing the test data.

  • timesteps (np.ndarray) – The timesteps array.

  • conf (dict) – The configuration dictionary.

Returns:

A dictionary containing extrapolation metrics.

Return type:

dict

codes.benchmark.evaluate_interpolation(model, surr_name, test_loader, timesteps, conf)#

Evaluate the interpolation performance of the surrogate model.

Parameters:
  • model – Instance of the surrogate model class.

  • surr_name (str) – The name of the surrogate model.

  • test_loader (DataLoader) – The DataLoader object containing the test data.

  • timesteps (np.ndarray) – The timesteps array.

  • conf (dict) – The configuration dictionary.

Returns:

A dictionary containing interpolation metrics.

Return type:

dict

codes.benchmark.evaluate_sparse(model, surr_name, test_loader, timesteps, n_train_samples, conf)#

Evaluate the performance of the surrogate model with sparse training data.

Parameters:
  • model – Instance of the surrogate model class.

  • surr_name (str) – The name of the surrogate model.

  • test_loader (DataLoader) – The DataLoader object containing the test data.

  • n_train_samples (int) – The number of training samples in the full dataset.

  • conf (dict) – The configuration dictionary.

Returns:

A dictionary containing sparse training metrics.

Return type:

dict

codes.benchmark.flatten_dict(d, parent_key='', sep=' - ')#

Flatten a nested dictionary.

Parameters:
  • d (dict) – The dictionary to flatten.

  • parent_key (str) – The base key string.

  • sep (str) – The separator between keys.

Returns:

Flattened dictionary with composite keys.

Return type:

dict

codes.benchmark.format_seconds(seconds)#

Format a duration given in seconds as hh:mm:ss.

Parameters:

seconds (int) – The duration in seconds.

Returns:

The formatted duration string.

Return type:

str

codes.benchmark.format_time(mean_time, std_time)#

Format mean and std time consistently in ns, µs, ms, or s.

Parameters:
  • mean_time – The mean time.

  • std_time – The standard deviation of the time.

Returns:

The formatted time string.

Return type:

str

codes.benchmark.get_custom_palette(num_colors)#

Returns a list of colors sampled from a custom color palette.

Parameters:

num_colors (int) – The number of colors needed.

Returns:

A list of RGBA color tuples.

Return type:

list

codes.benchmark.get_model_config(surr_name, config)#

Get the model configuration for a specific surrogate model from the dataset folder. Returns an empty dictionary if the configuration file is not found.

Parameters:
  • surr_name (str) – The name of the surrogate model.

  • config (dict) – The configuration dictionary.

Returns:

The model configuration dictionary.

Return type:

dict

codes.benchmark.get_required_models_list(surrogate, conf)#

Generate a list of required models based on the configuration settings.

Parameters:
  • surrogate (str) – The name of the surrogate model.

  • conf (dict) – The configuration dictionary.

Returns:

A list of required model names.

Return type:

list

codes.benchmark.get_surrogate(surrogate_name)#

Check if the surrogate model exists.

Parameters:

surrogate_name (str) – The name of the surrogate model.

Returns:

The surrogate model class if it exists, otherwise None.

Return type:

SurrogateModel | None

codes.benchmark.inference_time_bar_plot(surrogates, means, stds, config, save=True)#

Plot the mean inference time with standard deviation for different surrogate models.

Parameters:
  • surrogates (List[str]) – List of surrogate model names.

  • means (List[float]) – List of mean inference times for each surrogate model.

  • stds (List[float]) – List of standard deviation of inference times for each surrogate model.

  • config (dict) – Configuration dictionary.

  • save (bool, optional) – Whether to save the plot. Defaults to True.

Return type:

None

Returns:

None

codes.benchmark.int_ext_sparse(all_metrics, config)#

Function to make one comparative plot of the interpolation, extrapolation, and sparse training errors.

Parameters:
  • all_metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.

  • config (dict) – Configuration dictionary.

Return type:

None

Returns:

None

codes.benchmark.load_model(model, training_id, surr_name, model_identifier)#

Load a trained surrogate model.

Parameters:
  • model – Instance of the surrogate model class.

  • training_id (str) – The training identifier.

  • surr_name (str) – The name of the surrogate model.

  • model_identifier (str) – The identifier of the model (e.g., ‘main’).

Return type:

Module

Returns:

The loaded surrogate model.

codes.benchmark.make_comparison_csv(metrics, config)#

Generate a CSV file comparing metrics for different surrogate models.

Parameters:
  • metrics (dict) – Dictionary containing the benchmark metrics for each surrogate model.

  • config (dict) – Configuration dictionary.

Return type:

None

Returns:

None

codes.benchmark.measure_memory_footprint(model, inputs)#

Measure the memory footprint of the model during the forward and backward pass.

Parameters:
  • model (torch.nn.Module) – The PyTorch model.

  • inputs (tuple) – The input data for the model.

  • conf (dict) – The configuration dictionary.

  • surr_name (str) – The name of the surrogate model.

Returns:

A dictionary containing memory footprint measurements.

Return type:

dict

codes.benchmark.plot_MAE_comparison(MAEs, labels, config, save=True)#

Plot the MAE for different surrogate models.

Parameters:
  • MAE (tuple) – Tuple of accuracy arrays for each surrogate model.

  • labels (tuple) – Tuple of labels for each surrogate model.

  • config (dict) – Configuration dictionary.

  • save (bool) – Whether to save the plot.

Return type:

None

codes.benchmark.plot_MAE_comparison_train_duration(MAEs, labels, train_durations, config, save=True)#

Plot the MAE for different surrogate models.

Parameters:
  • MAE (tuple) – Tuple of accuracy arrays for each surrogate model.

  • labels (tuple) – Tuple of labels for each surrogate model.

  • config (dict) – Configuration dictionary.

  • save (bool) – Whether to save the plot.

Return type:

None

codes.benchmark.plot_average_errors_over_time(surr_name, conf, errors, metrics, timesteps, mode, save=False)#

Plot the errors over time for different modes (interpolation, extrapolation, sparse, batchsize).

Parameters:
  • surr_name (str) – The name of the surrogate model.

  • conf (dict) – The configuration dictionary.

  • errors (np.ndarray) – Errors array of shape [N_metrics, n_timesteps].

  • metrics (np.ndarray) – Metrics array of shape [N_metrics].

  • timesteps (np.ndarray) – Timesteps array.

  • mode (str) – The mode of evaluation (‘interpolation’, ‘extrapolation’, ‘sparse’, ‘batchsize’).

  • save (bool, optional) – Whether to save the plot as a file.

Return type:

None

codes.benchmark.plot_average_uncertainty_over_time(surr_name, conf, errors_time, preds_std, timesteps, save=False)#

Plot the average uncertainty over time.

Parameters:
  • surr_name (str) – The name of the surrogate model.

  • conf (dict) – The configuration dictionary.

  • errors_time (np.ndarray) – Prediction errors over time.

  • preds_std (np.ndarray) – Standard deviation over time of predictions from the ensemble of models.

  • timesteps (np.ndarray) – Timesteps array.

  • save (bool, optional) – Whether to save the plot as a file.

Return type:

None

codes.benchmark.plot_comparative_dynamic_correlation_heatmaps(gradients, errors, avg_correlations, max_grad, max_err, max_count, config, save=True)#

Plot comparative heatmaps of correlation between gradient and prediction errors for multiple surrogate models.

Parameters:
  • gradients (dict[str, np.ndarray]) – Dictionary of gradients from the ensemble of models.

  • errors (dict[str, np.ndarray]) – Dictionary of prediction errors.

  • avg_correlations (dict[str, float]) – Dictionary of average correlations between gradients and prediction errors.

  • max_grad (dict[str, float]) – Dictionary of maximum gradient values for axis scaling across models.

  • max_err (dict[str, float]) – Dictionary of maximum error values for axis scaling across models.

  • max_count (dict[str, float]) – Dictionary of maximum count values for heatmap normalization across models.

  • config (dict) – Configuration dictionary.

  • save (bool, optional) – Whether to save the plot. Defaults to True.

Return type:

None

Returns:

None

codes.benchmark.plot_comparative_error_correlation_heatmaps(preds_std, errors, avg_correlations, axis_max, max_count, config, save=True)#

Plot comparative heatmaps of correlation between predictive uncertainty and prediction errors for multiple surrogate models.

Parameters:
  • preds_std (dict[str, np.ndarray]) – Dictionary of standard deviation of predictions from the ensemble of models.

  • errors (dict[str, np.ndarray]) – Dictionary of prediction errors.

  • avg_correlations (dict[str, float]) – Dictionary of average correlations between gradients and prediction errors.

  • axis_max (dict[str, float]) – Dictionary of maximum values for axis scaling across models.

  • max_count (dict[str, float]) – Dictionary of maximum count values for heatmap normalization across models.

  • config (dict) – Configuration dictionary.

  • save (bool, optional) – Whether to save the plot. Defaults to True.

Return type:

None

Returns:

None

codes.benchmark.plot_dynamic_correlation(surr_name, conf, gradients, errors, save=False)#

Plot the correlation between the gradients of the data and the prediction errors.

Parameters:
  • surr_name (str) – The name of the surrogate model.

  • conf (dict) – The configuration dictionary.

  • gradients (np.ndarray) – The gradients of the data.

  • errors (np.ndarray) – The prediction errors.

  • save (bool) – Whether to save the plot.

codes.benchmark.plot_dynamic_correlation_heatmap(surr_name, conf, preds_std, errors, average_correlation, save=False, threshold_factor=0.0001, xcut_percent=0.003)#

Plot the correlation between predictive uncertainty and prediction errors using a heatmap.

Parameters:
  • surr_name (str) – The name of the surrogate model.

  • conf (dict) – The configuration dictionary.

  • preds_std (np.ndarray) – Standard deviation of predictions from the ensemble of models.

  • errors (np.ndarray) – Prediction errors.

  • average_correlation (float) – The average correlation between gradients and prediction errors (pearson correlation).

  • save (bool, optional) – Whether to save the plot as a file.

  • threshold_factor (float, optional) – Fraction of max value below which cells are set to 0. Default is 5e-5.

  • cutoff_percent (float, optional) – The percentage of total counts to include in the heatmap. Default is 0.95.

Return type:

None

codes.benchmark.plot_error_correlation_heatmap(surr_name, conf, preds_std, errors, average_correlation, save=False, threshold_factor=0.01)#

Plot the correlation between predictive uncertainty and prediction errors using a heatmap.

Parameters:
  • surr_name (str) – The name of the surrogate model.

  • conf (dict) – The configuration dictionary.

  • preds_std (np.ndarray) – Standard deviation of predictions from the ensemble of models.

  • errors (np.ndarray) – Prediction errors.

  • average_correlation (float) – The average correlation between gradients and prediction errors (pearson correlation).

  • save (bool, optional) – Whether to save the plot as a file.

  • threshold_factor (float, optional) – Fraction of max value below which cells are set to 0. Default is 0.001.

Return type:

None

codes.benchmark.plot_error_distribution_comparative(errors, conf, save=True)#

Plot the comparative distribution of errors for each surrogate model as a smoothed histogram plot.

Parameters:
  • conf (dict) – The configuration dictionary.

  • errors (dict) – Dictionary containing numpy arrays of shape [num_samples, num_timesteps, num_chemicals] for each model.

  • save (bool, optional) – Whether to save the plot as a file.

Return type:

None

codes.benchmark.plot_error_distribution_per_chemical(surr_name, conf, errors, chemical_names=None, num_chemicals=10, save=True)#

Plot the distribution of errors for each chemical as a smoothed histogram plot.

Parameters:
  • surr_name (str) – The name of the surrogate model.

  • conf (dict) – The configuration dictionary.

  • errors (np.ndarray) – Errors array of shape [num_samples, num_timesteps, num_chemicals].

  • chemical_names (list, optional) – List of chemical names for labeling the lines.

  • num_chemicals (int, optional) – Number of chemicals to plot. Default is 10.

  • save (bool, optional) – Whether to save the plot as a file.

Return type:

None

codes.benchmark.plot_example_predictions_with_uncertainty(surr_name, conf, preds_mean, preds_std, targets, timesteps, example_idx=0, num_chemicals=100, labels=None, save=False)#

Plot example predictions with uncertainty.

Parameters:
  • surr_name (str) – The name of the surrogate model.

  • conf (dict) – The configuration dictionary.

  • preds_mean (np.ndarray) – Mean predictions from the ensemble of models.

  • preds_std (np.ndarray) – Standard deviation of predictions from the ensemble of models.

  • targets (np.ndarray) – True targets.

  • timesteps (np.ndarray) – Timesteps array.

  • example_idx (int, optional) – Index of the example to plot. Default is 0.

  • num_chemicals (int, optional) – Number of chemicals to plot. Default is 100.

  • labels (list, optional) – List of labels for the chemicals.

  • save (bool, optional) – Whether to save the plot as a file.

Return type:

None

codes.benchmark.plot_generalization_error_comparison(surrogates, metrics_list, model_errors_list, xlabel, filename, config, save=True, xlog=False)#

Plot the generalization errors of different surrogate models.

Parameters:
  • surrogates (list) – List of surrogate model names.

  • metrics_list (list[np.array]) – List of numpy arrays containing the metrics for each surrogate model.

  • model_errors_list (list[np.array]) – List of numpy arrays containing the errors for each surrogate model.

  • xlabel (str) – Label for the x-axis.

  • filename (str) – Filename to save the plot.

  • config (dict) – Configuration dictionary.

  • save (bool) – Whether to save the plot.

  • xlog (bool) – Whether to use a log scale for the x-axis.

Return type:

None

Returns:

None

codes.benchmark.plot_generalization_errors(surr_name, conf, metrics, model_errors, mode, save=False)#

Plot the generalization errors of a model for various metrics.

Parameters:
  • surr_name (str) – The name of the surrogate model.

  • conf (dict) – The configuration dictionary.

  • metrics (np.ndarray) – The metrics (e.g., intervals, cutoffs, batch sizes, number of training samples).

  • model_errors (np.ndarray) – The model errors.

  • mode (str) – The mode of generalization (“interpolation”, “extrapolation”, “sparse”, “batchsize”).

  • save (bool) – Whether to save the plot.

Return type:

None

Returns:

None

codes.benchmark.plot_loss_comparison(train_losses, test_losses, labels, config, save=True)#

Plot the training and test losses for different surrogate models.

Parameters:
  • train_losses (tuple) – Tuple of training loss arrays for each surrogate model.

  • test_losses (tuple) – Tuple of test loss arrays for each surrogate model.

  • labels (tuple) – Tuple of labels for each surrogate model.

  • config (dict) – Configuration dictionary.

  • save (bool) – Whether to save the plot.

Return type:

None

Returns:

None

codes.benchmark.plot_losses(loss_histories, labels, title='Losses', save=False, conf=None, surr_name=None, mode='main')#

Plot the loss trajectories for the training of multiple models.

Parameters:
  • loss_histories (tuple[array, ...]) – List of loss history arrays.

  • labels (tuple[str, ...]) – List of labels for each loss history.

  • title (str) – Title of the plot.

  • save (bool) – Whether to save the plot as an image file.

  • conf (Optional[dict]) – The configuration dictionary.

  • surr_name (Optional[str]) – The name of the surrogate model.

  • mode (str) – The mode of the training.

Return type:

None

codes.benchmark.plot_relative_errors(mean_errors, median_errors, timesteps, config, save=True)#

Plot the relative errors over time for different surrogate models.

Parameters:
  • mean_errors (dict) – dictionary containing the mean relative errors for each surrogate model.

  • median_errors (dict) – dictionary containing the median relative errors for each surrogate model.

  • timesteps (np.ndarray) – Array of timesteps.

  • config (dict) – Configuration dictionary.

  • save (bool) – Whether to save the plot.

Return type:

None

Returns:

None

codes.benchmark.plot_relative_errors_over_time(surr_name, conf, relative_errors, title, save=False)#

Plot the mean and median relative errors over time with shaded regions for the 50th, 90th, and 99th percentiles.

Parameters:
  • surr_name (str) – The name of the surrogate model.

  • conf (dict) – The configuration dictionary.

  • relative_errors (np.ndarray) – The relative errors of the model.

  • title (str) – The title of the plot.

  • save (bool) – Whether to save the plot.

Return type:

None

codes.benchmark.plot_surr_losses(model, surr_name, conf, timesteps)#

Plot the training and test losses for the surrogate model.

Parameters:
  • model – Instance of the surrogate model class.

  • surr_name (str) – The name of the surrogate model.

  • conf (dict) – The configuration dictionary.

  • timesteps (np.ndarray) – The timesteps array.

Return type:

None

codes.benchmark.plot_uncertainty_over_time_comparison(uncertainties, absolute_errors, timesteps, config, save=True)#

Plot the uncertainty over time for different surrogate models.

Parameters:
  • uncertainties (dict) – Dictionary containing the uncertainties for each surrogate model.

  • absolute_errors (dict) – Dictionary containing the absolute errors for each surrogate model.

  • timesteps (np.ndarray) – Array of timesteps.

  • config (dict) – Configuration dictionary.

  • save (bool) – Whether to save the plot.

Return type:

None

Returns:

None

codes.benchmark.plot_uncertainty_vs_errors(surr_name, conf, preds_std, errors, save=False)#

Plot the correlation between predictive uncertainty and prediction errors.

Parameters:
  • surr_name (str) – The name of the surrogate model.

  • conf (dict) – The configuration dictionary.

  • preds_std (np.ndarray) – Standard deviation of predictions from the ensemble of models.

  • errors (np.ndarray) – Prediction errors.

  • save (bool, optional) – Whether to save the plot as a file.

Return type:

None

codes.benchmark.read_yaml_config(config_path)#

Read the YAML configuration file.

Parameters:

config_path (str) – Path to the YAML configuration file.

Returns:

The configuration dictionary.

Return type:

dict

codes.benchmark.rel_errors_and_uq(metrics, config, save=True)#

Create a figure with two subplots: relative errors over time and uncertainty over time for different surrogate models.

Parameters:
  • metrics (dict) – Dictionary containing the benchmark metrics for each surrogate model.

  • config (dict) – Configuration dictionary.

  • save (bool) – Whether to save the plot.

Return type:

None

Returns:

None

codes.benchmark.run_benchmark(surr_name, surrogate_class, conf)#

Run benchmarks for a given surrogate model.

Parameters:
  • surr_name (str) – The name of the surrogate model to benchmark.

  • surrogate_class – The class of the surrogate model.

  • conf (dict) – The configuration dictionary.

Returns:

A dictionary containing all relevant metrics for the given model.

Return type:

dict

codes.benchmark.save_plot(plt, filename, conf, surr_name='', dpi=300, base_dir='plots', increase_count=False)#

Save the plot to a file, creating necessary directories if they don’t exist.

Parameters:
  • plt (matplotlib.pyplot) – The plot object to save.

  • filename (str) – The desired filename for the plot.

  • conf (dict) – The configuration dictionary.

  • surr_name (str) – The name of the surrogate model.

  • dpi (int) – The resolution of the saved plot.

  • base_dir (str, optional) – The base directory where plots will be saved. Default is “plots”.

  • increase_count (bool, optional) – Whether to increment the filename count if a file already exists. Default is True.

Raises:

ValueError – If the configuration dictionary does not contain the required keys.

Return type:

None

codes.benchmark.save_plot_counter(filename, directory, increase_count=True)#

Save a plot with an incremented filename if a file with the same name already exists.

Parameters:
  • filename (str) – The desired filename for the plot.

  • directory (str) – The directory to save the plot in.

  • increase_count (bool, optional) – Whether to increment the filename count if a file already exists. Default is True.

Returns:

The full path to the saved plot.

Return type:

str

codes.benchmark.tabular_comparison(all_metrics, config)#

Compare the metrics of different surrogate models in a tabular format.

Parameters:
  • all_metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.

  • config (dict) – Configuration dictionary.

Return type:

None

Returns:

None

codes.benchmark.time_inference(model, surr_name, test_loader, conf, n_test_samples, n_runs=5)#

Time the inference of the surrogate model.

Parameters:
  • model – Instance of the surrogate model class.

  • surr_name (str) – The name of the surrogate model.

  • test_loader (DataLoader) – The DataLoader object containing the test data.

  • timesteps (np.ndarray) – The timesteps array.

  • conf (dict) – The configuration dictionary.

  • n_test_samples (int) – The number of test samples.

  • n_runs (int, optional) – Number of times to run the inference for timing.

Returns:

A dictionary containing timing metrics.

Return type:

dict

codes.benchmark.write_metrics_to_yaml(surr_name, conf, metrics)#

Write the benchmark metrics to a YAML file.

Parameters:
  • surr_name (str) – The name of the surrogate model.

  • conf (dict) – The configuration dictionary.

  • metrics (dict) – The benchmark metrics.

Return type:

None