codes.benchmark package#
Submodules#
codes.benchmark.bench_fcts module#
- codes.benchmark.bench_fcts.compare_MAE(metrics, config)#
Compare the MAE of different surrogate models over the course of training.
- Parameters:
metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.
config (dict) – Configuration dictionary.
- Return type:
None
- Returns:
None
- codes.benchmark.bench_fcts.compare_UQ(all_metrics, config)#
Compare the uncertainty quantification (UQ) metrics of different surrogate models.
- Parameters:
all_metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.
config (dict) – Configuration dictionary.
- Return type:
None
- Returns:
None
- codes.benchmark.bench_fcts.compare_batchsize(all_metrics, config)#
Compare the batch size training errors of different surrogate models.
- Parameters:
all_metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.
config (dict) – Configuration dictionary.
- Return type:
None
- Returns:
None
- codes.benchmark.bench_fcts.compare_dynamic_accuracy(metrics, config)#
Compare the gradients of different surrogate models.
- Parameters:
metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.
config (dict) – Configuration dictionary.
- Return type:
None
- Returns:
None
- codes.benchmark.bench_fcts.compare_extrapolation(all_metrics, config)#
Compare the extrapolation errors of different surrogate models.
- Parameters:
all_metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.
config (dict) – Configuration dictionary.
- Return type:
None
- Returns:
None
- codes.benchmark.bench_fcts.compare_inference_time(metrics, config, save=True)#
Compare the mean inference time of different surrogate models.
- Parameters:
metrics (dict[str, dict]) – dictionary containing the benchmark metrics for each surrogate model.
config (dict) – Configuration dictionary.
save (bool, optional) – Whether to save the plot. Defaults to True.
- Return type:
None
- Returns:
None
- codes.benchmark.bench_fcts.compare_interpolation(all_metrics, config)#
Compare the interpolation errors of different surrogate models.
- Parameters:
all_metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.
config (dict) – Configuration dictionary.
- Return type:
None
- Returns:
None
- codes.benchmark.bench_fcts.compare_main_losses(metrics, config)#
Compare the training and test losses of the main models for different surrogate models.
- Parameters:
metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.
config (dict) – Configuration dictionary.
- Return type:
None
- Returns:
None
- codes.benchmark.bench_fcts.compare_models(metrics, config)#
- codes.benchmark.bench_fcts.compare_relative_errors(metrics, config)#
Compare the relative errors over time for different surrogate models.
- Parameters:
metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.
config (dict) – Configuration dictionary.
- Return type:
None
- Returns:
None
- codes.benchmark.bench_fcts.compare_sparse(all_metrics, config)#
Compare the sparse training errors of different surrogate models.
- Parameters:
all_metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.
config (dict) – Configuration dictionary.
- Return type:
None
- Returns:
None
- codes.benchmark.bench_fcts.evaluate_UQ(model, surr_name, test_loader, timesteps, conf, labels=None)#
Evaluate the uncertainty quantification (UQ) performance of the surrogate model.
- Parameters:
model – Instance of the surrogate model class.
surr_name (str) – The name of the surrogate model.
test_loader (DataLoader) – The DataLoader object containing the test data.
timesteps (np.ndarray) – The timesteps array.
conf (dict) – The configuration dictionary.
labels (list, optional) – The labels for the chemical species.
- Returns:
A dictionary containing UQ metrics.
- Return type:
dict
- codes.benchmark.bench_fcts.evaluate_accuracy(model, surr_name, test_loader, conf, labels=None)#
Evaluate the accuracy of the surrogate model.
- Parameters:
model – Instance of the surrogate model class.
surr_name (str) – The name of the surrogate model.
test_loader (DataLoader) – The DataLoader object containing the test data.
conf (dict) – The configuration dictionary.
labels (list, optional) – The labels for the chemical species.
- Returns:
A dictionary containing accuracy metrics.
- Return type:
dict
- codes.benchmark.bench_fcts.evaluate_batchsize(model, surr_name, test_loader, timesteps, conf)#
Evaluate the performance of the surrogate model with different batch sizes.
- Parameters:
model – Instance of the surrogate model class.
surr_name (str) – The name of the surrogate model.
test_loader (DataLoader) – The DataLoader object containing the test data.
timesteps (np.ndarray) – The timesteps array.
conf (dict) – The configuration dictionary.
- Returns:
A dictionary containing batch size training metrics.
- Return type:
dict
- codes.benchmark.bench_fcts.evaluate_compute(model, surr_name, test_loader, conf)#
Evaluate the computational resource requirements of the surrogate model.
- Parameters:
model – Instance of the surrogate model class.
surr_name (str) – The name of the surrogate model.
test_loader (DataLoader) – The DataLoader object containing the test data.
conf (dict) – The configuration dictionary.
- Returns:
A dictionary containing model complexity metrics.
- Return type:
dict
- codes.benchmark.bench_fcts.evaluate_dynamic_accuracy(model, surr_name, test_loader, conf, species_names=None)#
Evaluate the gradients of the surrogate model.
- Parameters:
model – Instance of the surrogate model class.
surr_name (str) – The name of the surrogate model.
test_loader (DataLoader) – The DataLoader object containing the test data.
conf (dict) – The configuration dictionary.
- Returns:
A dictionary containing gradients metrics.
- Return type:
dict
- codes.benchmark.bench_fcts.evaluate_extrapolation(model, surr_name, test_loader, timesteps, conf)#
Evaluate the extrapolation performance of the surrogate model.
- Parameters:
model – Instance of the surrogate model class.
surr_name (str) – The name of the surrogate model.
test_loader (DataLoader) – The DataLoader object containing the test data.
timesteps (np.ndarray) – The timesteps array.
conf (dict) – The configuration dictionary.
- Returns:
A dictionary containing extrapolation metrics.
- Return type:
dict
- codes.benchmark.bench_fcts.evaluate_interpolation(model, surr_name, test_loader, timesteps, conf)#
Evaluate the interpolation performance of the surrogate model.
- Parameters:
model – Instance of the surrogate model class.
surr_name (str) – The name of the surrogate model.
test_loader (DataLoader) – The DataLoader object containing the test data.
timesteps (np.ndarray) – The timesteps array.
conf (dict) – The configuration dictionary.
- Returns:
A dictionary containing interpolation metrics.
- Return type:
dict
- codes.benchmark.bench_fcts.evaluate_sparse(model, surr_name, test_loader, timesteps, n_train_samples, conf)#
Evaluate the performance of the surrogate model with sparse training data.
- Parameters:
model – Instance of the surrogate model class.
surr_name (str) – The name of the surrogate model.
test_loader (DataLoader) – The DataLoader object containing the test data.
n_train_samples (int) – The number of training samples in the full dataset.
conf (dict) – The configuration dictionary.
- Returns:
A dictionary containing sparse training metrics.
- Return type:
dict
- codes.benchmark.bench_fcts.run_benchmark(surr_name, surrogate_class, conf)#
Run benchmarks for a given surrogate model.
- Parameters:
surr_name (str) – The name of the surrogate model to benchmark.
surrogate_class – The class of the surrogate model.
conf (dict) – The configuration dictionary.
- Returns:
A dictionary containing all relevant metrics for the given model.
- Return type:
dict
- codes.benchmark.bench_fcts.tabular_comparison(all_metrics, config)#
Compare the metrics of different surrogate models in a tabular format.
- Parameters:
all_metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.
config (dict) – Configuration dictionary.
- Return type:
None
- Returns:
None
- codes.benchmark.bench_fcts.time_inference(model, surr_name, test_loader, conf, n_test_samples, n_runs=5)#
Time the inference of the surrogate model.
- Parameters:
model – Instance of the surrogate model class.
surr_name (str) – The name of the surrogate model.
test_loader (DataLoader) – The DataLoader object containing the test data.
timesteps (np.ndarray) – The timesteps array.
conf (dict) – The configuration dictionary.
n_test_samples (int) – The number of test samples.
n_runs (int, optional) – Number of times to run the inference for timing.
- Returns:
A dictionary containing timing metrics.
- Return type:
dict
codes.benchmark.bench_plots module#
- codes.benchmark.bench_plots.get_custom_palette(num_colors)#
Returns a list of colors sampled from a custom color palette.
- Parameters:
num_colors (int) – The number of colors needed.
- Returns:
A list of RGBA color tuples.
- Return type:
list
- codes.benchmark.bench_plots.inference_time_bar_plot(surrogates, means, stds, config, save=True)#
Plot the mean inference time with standard deviation for different surrogate models.
- Parameters:
surrogates (List[str]) – List of surrogate model names.
means (List[float]) – List of mean inference times for each surrogate model.
stds (List[float]) – List of standard deviation of inference times for each surrogate model.
config (dict) – Configuration dictionary.
save (bool, optional) – Whether to save the plot. Defaults to True.
- Return type:
None
- Returns:
None
- codes.benchmark.bench_plots.int_ext_sparse(all_metrics, config)#
Function to make one comparative plot of the interpolation, extrapolation, and sparse training errors.
- Parameters:
all_metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.
config (dict) – Configuration dictionary.
- Return type:
None
- Returns:
None
- codes.benchmark.bench_plots.plot_MAE_comparison(MAEs, labels, config, save=True)#
Plot the MAE for different surrogate models.
- Parameters:
MAE (tuple) – Tuple of accuracy arrays for each surrogate model.
labels (tuple) – Tuple of labels for each surrogate model.
config (dict) – Configuration dictionary.
save (bool) – Whether to save the plot.
- Return type:
None
- codes.benchmark.bench_plots.plot_MAE_comparison_train_duration(MAEs, labels, train_durations, config, save=True)#
Plot the MAE for different surrogate models.
- Parameters:
MAE (tuple) – Tuple of accuracy arrays for each surrogate model.
labels (tuple) – Tuple of labels for each surrogate model.
config (dict) – Configuration dictionary.
save (bool) – Whether to save the plot.
- Return type:
None
- codes.benchmark.bench_plots.plot_average_errors_over_time(surr_name, conf, errors, metrics, timesteps, mode, save=False)#
Plot the errors over time for different modes (interpolation, extrapolation, sparse, batchsize).
- Parameters:
surr_name (str) – The name of the surrogate model.
conf (dict) – The configuration dictionary.
errors (np.ndarray) – Errors array of shape [N_metrics, n_timesteps].
metrics (np.ndarray) – Metrics array of shape [N_metrics].
timesteps (np.ndarray) – Timesteps array.
mode (str) – The mode of evaluation (‘interpolation’, ‘extrapolation’, ‘sparse’, ‘batchsize’).
save (bool, optional) – Whether to save the plot as a file.
- Return type:
None
- codes.benchmark.bench_plots.plot_average_uncertainty_over_time(surr_name, conf, errors_time, preds_std, timesteps, save=False)#
Plot the average uncertainty over time.
- Parameters:
surr_name (str) – The name of the surrogate model.
conf (dict) – The configuration dictionary.
errors_time (np.ndarray) – Prediction errors over time.
preds_std (np.ndarray) – Standard deviation over time of predictions from the ensemble of models.
timesteps (np.ndarray) – Timesteps array.
save (bool, optional) – Whether to save the plot as a file.
- Return type:
None
- codes.benchmark.bench_plots.plot_comparative_dynamic_correlation_heatmaps(gradients, errors, avg_correlations, max_grad, max_err, max_count, config, save=True)#
Plot comparative heatmaps of correlation between gradient and prediction errors for multiple surrogate models.
- Parameters:
gradients (dict[str, np.ndarray]) – Dictionary of gradients from the ensemble of models.
errors (dict[str, np.ndarray]) – Dictionary of prediction errors.
avg_correlations (dict[str, float]) – Dictionary of average correlations between gradients and prediction errors.
max_grad (dict[str, float]) – Dictionary of maximum gradient values for axis scaling across models.
max_err (dict[str, float]) – Dictionary of maximum error values for axis scaling across models.
max_count (dict[str, float]) – Dictionary of maximum count values for heatmap normalization across models.
config (dict) – Configuration dictionary.
save (bool, optional) – Whether to save the plot. Defaults to True.
- Return type:
None
- Returns:
None
- codes.benchmark.bench_plots.plot_comparative_error_correlation_heatmaps(preds_std, errors, avg_correlations, axis_max, max_count, config, save=True)#
Plot comparative heatmaps of correlation between predictive uncertainty and prediction errors for multiple surrogate models.
- Parameters:
preds_std (dict[str, np.ndarray]) – Dictionary of standard deviation of predictions from the ensemble of models.
errors (dict[str, np.ndarray]) – Dictionary of prediction errors.
avg_correlations (dict[str, float]) – Dictionary of average correlations between gradients and prediction errors.
axis_max (dict[str, float]) – Dictionary of maximum values for axis scaling across models.
max_count (dict[str, float]) – Dictionary of maximum count values for heatmap normalization across models.
config (dict) – Configuration dictionary.
save (bool, optional) – Whether to save the plot. Defaults to True.
- Return type:
None
- Returns:
None
- codes.benchmark.bench_plots.plot_dynamic_correlation(surr_name, conf, gradients, errors, save=False)#
Plot the correlation between the gradients of the data and the prediction errors.
- Parameters:
surr_name (str) – The name of the surrogate model.
conf (dict) – The configuration dictionary.
gradients (np.ndarray) – The gradients of the data.
errors (np.ndarray) – The prediction errors.
save (bool) – Whether to save the plot.
- codes.benchmark.bench_plots.plot_dynamic_correlation_heatmap(surr_name, conf, preds_std, errors, average_correlation, save=False, threshold_factor=0.0001, xcut_percent=0.003)#
Plot the correlation between predictive uncertainty and prediction errors using a heatmap.
- Parameters:
surr_name (str) – The name of the surrogate model.
conf (dict) – The configuration dictionary.
preds_std (np.ndarray) – Standard deviation of predictions from the ensemble of models.
errors (np.ndarray) – Prediction errors.
average_correlation (float) – The average correlation between gradients and prediction errors (pearson correlation).
save (bool, optional) – Whether to save the plot as a file.
threshold_factor (float, optional) – Fraction of max value below which cells are set to 0. Default is 5e-5.
cutoff_percent (float, optional) – The percentage of total counts to include in the heatmap. Default is 0.95.
- Return type:
None
- codes.benchmark.bench_plots.plot_error_correlation_heatmap(surr_name, conf, preds_std, errors, average_correlation, save=False, threshold_factor=0.01)#
Plot the correlation between predictive uncertainty and prediction errors using a heatmap.
- Parameters:
surr_name (str) – The name of the surrogate model.
conf (dict) – The configuration dictionary.
preds_std (np.ndarray) – Standard deviation of predictions from the ensemble of models.
errors (np.ndarray) – Prediction errors.
average_correlation (float) – The average correlation between gradients and prediction errors (pearson correlation).
save (bool, optional) – Whether to save the plot as a file.
threshold_factor (float, optional) – Fraction of max value below which cells are set to 0. Default is 0.001.
- Return type:
None
- codes.benchmark.bench_plots.plot_error_distribution_comparative(errors, conf, save=True)#
Plot the comparative distribution of errors for each surrogate model as a smoothed histogram plot.
- Parameters:
conf (dict) – The configuration dictionary.
errors (dict) – Dictionary containing numpy arrays of shape [num_samples, num_timesteps, num_chemicals] for each model.
save (bool, optional) – Whether to save the plot as a file.
- Return type:
None
- codes.benchmark.bench_plots.plot_error_distribution_per_chemical(surr_name, conf, errors, chemical_names=None, num_chemicals=10, save=True)#
Plot the distribution of errors for each chemical as a smoothed histogram plot.
- Parameters:
surr_name (str) – The name of the surrogate model.
conf (dict) – The configuration dictionary.
errors (np.ndarray) – Errors array of shape [num_samples, num_timesteps, num_chemicals].
chemical_names (list, optional) – List of chemical names for labeling the lines.
num_chemicals (int, optional) – Number of chemicals to plot. Default is 10.
save (bool, optional) – Whether to save the plot as a file.
- Return type:
None
- codes.benchmark.bench_plots.plot_example_predictions_with_uncertainty(surr_name, conf, preds_mean, preds_std, targets, timesteps, example_idx=0, num_chemicals=100, labels=None, save=False)#
Plot example predictions with uncertainty.
- Parameters:
surr_name (str) – The name of the surrogate model.
conf (dict) – The configuration dictionary.
preds_mean (np.ndarray) – Mean predictions from the ensemble of models.
preds_std (np.ndarray) – Standard deviation of predictions from the ensemble of models.
targets (np.ndarray) – True targets.
timesteps (np.ndarray) – Timesteps array.
example_idx (int, optional) – Index of the example to plot. Default is 0.
num_chemicals (int, optional) – Number of chemicals to plot. Default is 100.
labels (list, optional) – List of labels for the chemicals.
save (bool, optional) – Whether to save the plot as a file.
- Return type:
None
- codes.benchmark.bench_plots.plot_generalization_error_comparison(surrogates, metrics_list, model_errors_list, xlabel, filename, config, save=True, xlog=False)#
Plot the generalization errors of different surrogate models.
- Parameters:
surrogates (list) – List of surrogate model names.
metrics_list (list[np.array]) – List of numpy arrays containing the metrics for each surrogate model.
model_errors_list (list[np.array]) – List of numpy arrays containing the errors for each surrogate model.
xlabel (str) – Label for the x-axis.
filename (str) – Filename to save the plot.
config (dict) – Configuration dictionary.
save (bool) – Whether to save the plot.
xlog (bool) – Whether to use a log scale for the x-axis.
- Return type:
None
- Returns:
None
- codes.benchmark.bench_plots.plot_generalization_errors(surr_name, conf, metrics, model_errors, mode, save=False)#
Plot the generalization errors of a model for various metrics.
- Parameters:
surr_name (str) – The name of the surrogate model.
conf (dict) – The configuration dictionary.
metrics (np.ndarray) – The metrics (e.g., intervals, cutoffs, batch sizes, number of training samples).
model_errors (np.ndarray) – The model errors.
mode (str) – The mode of generalization (“interpolation”, “extrapolation”, “sparse”, “batchsize”).
save (bool) – Whether to save the plot.
- Return type:
None
- Returns:
None
- codes.benchmark.bench_plots.plot_loss_comparison(train_losses, test_losses, labels, config, save=True)#
Plot the training and test losses for different surrogate models.
- Parameters:
train_losses (tuple) – Tuple of training loss arrays for each surrogate model.
test_losses (tuple) – Tuple of test loss arrays for each surrogate model.
labels (tuple) – Tuple of labels for each surrogate model.
config (dict) – Configuration dictionary.
save (bool) – Whether to save the plot.
- Return type:
None
- Returns:
None
- codes.benchmark.bench_plots.plot_losses(loss_histories, labels, title='Losses', save=False, conf=None, surr_name=None, mode='main')#
Plot the loss trajectories for the training of multiple models.
- Parameters:
loss_histories (
tuple
[array
,...
]) – List of loss history arrays.labels (
tuple
[str
,...
]) – List of labels for each loss history.title (
str
) – Title of the plot.save (
bool
) – Whether to save the plot as an image file.conf (
Optional
[dict
]) – The configuration dictionary.surr_name (
Optional
[str
]) – The name of the surrogate model.mode (
str
) – The mode of the training.
- Return type:
None
- codes.benchmark.bench_plots.plot_relative_errors(mean_errors, median_errors, timesteps, config, save=True)#
Plot the relative errors over time for different surrogate models.
- Parameters:
mean_errors (dict) – dictionary containing the mean relative errors for each surrogate model.
median_errors (dict) – dictionary containing the median relative errors for each surrogate model.
timesteps (np.ndarray) – Array of timesteps.
config (dict) – Configuration dictionary.
save (bool) – Whether to save the plot.
- Return type:
None
- Returns:
None
- codes.benchmark.bench_plots.plot_relative_errors_over_time(surr_name, conf, relative_errors, title, save=False)#
Plot the mean and median relative errors over time with shaded regions for the 50th, 90th, and 99th percentiles.
- Parameters:
surr_name (str) – The name of the surrogate model.
conf (dict) – The configuration dictionary.
relative_errors (np.ndarray) – The relative errors of the model.
title (str) – The title of the plot.
save (bool) – Whether to save the plot.
- Return type:
None
- codes.benchmark.bench_plots.plot_surr_losses(model, surr_name, conf, timesteps)#
Plot the training and test losses for the surrogate model.
- Parameters:
model – Instance of the surrogate model class.
surr_name (str) – The name of the surrogate model.
conf (dict) – The configuration dictionary.
timesteps (np.ndarray) – The timesteps array.
- Return type:
None
- codes.benchmark.bench_plots.plot_uncertainty_over_time_comparison(uncertainties, absolute_errors, timesteps, config, save=True)#
Plot the uncertainty over time for different surrogate models.
- Parameters:
uncertainties (dict) – Dictionary containing the uncertainties for each surrogate model.
absolute_errors (dict) – Dictionary containing the absolute errors for each surrogate model.
timesteps (np.ndarray) – Array of timesteps.
config (dict) – Configuration dictionary.
save (bool) – Whether to save the plot.
- Return type:
None
- Returns:
None
- codes.benchmark.bench_plots.plot_uncertainty_vs_errors(surr_name, conf, preds_std, errors, save=False)#
Plot the correlation between predictive uncertainty and prediction errors.
- Parameters:
surr_name (str) – The name of the surrogate model.
conf (dict) – The configuration dictionary.
preds_std (np.ndarray) – Standard deviation of predictions from the ensemble of models.
errors (np.ndarray) – Prediction errors.
save (bool, optional) – Whether to save the plot as a file.
- Return type:
None
- codes.benchmark.bench_plots.rel_errors_and_uq(metrics, config, save=True)#
Create a figure with two subplots: relative errors over time and uncertainty over time for different surrogate models.
- Parameters:
metrics (dict) – Dictionary containing the benchmark metrics for each surrogate model.
config (dict) – Configuration dictionary.
save (bool) – Whether to save the plot.
- Return type:
None
- Returns:
None
- codes.benchmark.bench_plots.save_plot(plt, filename, conf, surr_name='', dpi=300, base_dir='plots', increase_count=False)#
Save the plot to a file, creating necessary directories if they don’t exist.
- Parameters:
plt (matplotlib.pyplot) – The plot object to save.
filename (str) – The desired filename for the plot.
conf (dict) – The configuration dictionary.
surr_name (str) – The name of the surrogate model.
dpi (int) – The resolution of the saved plot.
base_dir (str, optional) – The base directory where plots will be saved. Default is “plots”.
increase_count (bool, optional) – Whether to increment the filename count if a file already exists. Default is True.
- Raises:
ValueError – If the configuration dictionary does not contain the required keys.
- Return type:
None
- codes.benchmark.bench_plots.save_plot_counter(filename, directory, increase_count=True)#
Save a plot with an incremented filename if a file with the same name already exists.
- Parameters:
filename (str) – The desired filename for the plot.
directory (str) – The directory to save the plot in.
increase_count (bool, optional) – Whether to increment the filename count if a file already exists. Default is True.
- Returns:
The full path to the saved plot.
- Return type:
str
codes.benchmark.bench_utils module#
- codes.benchmark.bench_utils.check_benchmark(conf)#
Check whether there are any configuration issues with the benchmark.
- Parameters:
conf (dict) – The configuration dictionary.
- Raises:
FileNotFoundError – If the training ID directory is missing or if the .yaml file is missing.
ValueError – If the configuration is missing required keys or the values do not match the training configuration.
- Return type:
None
- codes.benchmark.bench_utils.check_surrogate(surrogate, conf)#
Check whether the required models for the benchmark are present in the expected directories.
- Parameters:
surrogate (str) – The name of the surrogate model to check.
conf (dict) – The configuration dictionary.
- Raises:
FileNotFoundError – If any required models are missing.
- Return type:
None
- codes.benchmark.bench_utils.clean_metrics(metrics, conf)#
Clean the metrics dictionary to remove problematic entries.
- Parameters:
metrics (dict) – The benchmark metrics.
conf (dict) – The configuration dictionary.
- Returns:
The cleaned metrics dictionary.
- Return type:
dict
- codes.benchmark.bench_utils.convert_dict_to_scientific_notation(d, precision=8)#
Convert all numerical values in a dictionary to scientific notation.
- Parameters:
d (dict) – The input dictionary.
- Returns:
The dictionary with numerical values in scientific notation.
- Return type:
dict
- codes.benchmark.bench_utils.convert_to_standard_types(data)#
Recursively convert data to standard types that can be serialized to YAML.
- Parameters:
data – The data to convert.
- Returns:
The converted data.
- codes.benchmark.bench_utils.count_trainable_parameters(model)#
Count the number of trainable parameters in the model.
- Parameters:
model (torch.nn.Module) – The PyTorch model.
- Returns:
The number of trainable parameters.
- Return type:
int
- codes.benchmark.bench_utils.discard_numpy_entries(d)#
Recursively remove dictionary entries that contain NumPy arrays.
- Parameters:
d (dict) – The input dictionary.
- Returns:
A new dictionary without entries containing NumPy arrays.
- Return type:
dict
- codes.benchmark.bench_utils.flatten_dict(d, parent_key='', sep=' - ')#
Flatten a nested dictionary.
- Parameters:
d (dict) – The dictionary to flatten.
parent_key (str) – The base key string.
sep (str) – The separator between keys.
- Returns:
Flattened dictionary with composite keys.
- Return type:
dict
- codes.benchmark.bench_utils.format_seconds(seconds)#
Format a duration given in seconds as hh:mm:ss.
- Parameters:
seconds (int) – The duration in seconds.
- Returns:
The formatted duration string.
- Return type:
str
- codes.benchmark.bench_utils.format_time(mean_time, std_time)#
Format mean and std time consistently in ns, µs, ms, or s.
- Parameters:
mean_time – The mean time.
std_time – The standard deviation of the time.
- Returns:
The formatted time string.
- Return type:
str
- codes.benchmark.bench_utils.get_model_config(surr_name, config)#
Get the model configuration for a specific surrogate model from the dataset folder. Returns an empty dictionary if the configuration file is not found.
- Parameters:
surr_name (str) – The name of the surrogate model.
config (dict) – The configuration dictionary.
- Returns:
The model configuration dictionary.
- Return type:
dict
- codes.benchmark.bench_utils.get_required_models_list(surrogate, conf)#
Generate a list of required models based on the configuration settings.
- Parameters:
surrogate (str) – The name of the surrogate model.
conf (dict) – The configuration dictionary.
- Returns:
A list of required model names.
- Return type:
list
- codes.benchmark.bench_utils.get_surrogate(surrogate_name)#
Check if the surrogate model exists.
- Parameters:
surrogate_name (str) – The name of the surrogate model.
- Returns:
The surrogate model class if it exists, otherwise None.
- Return type:
SurrogateModel | None
- codes.benchmark.bench_utils.load_model(model, training_id, surr_name, model_identifier)#
Load a trained surrogate model.
- Parameters:
model – Instance of the surrogate model class.
training_id (str) – The training identifier.
surr_name (str) – The name of the surrogate model.
model_identifier (str) – The identifier of the model (e.g., ‘main’).
- Return type:
Module
- Returns:
The loaded surrogate model.
- codes.benchmark.bench_utils.make_comparison_csv(metrics, config)#
Generate a CSV file comparing metrics for different surrogate models.
- Parameters:
metrics (dict) – Dictionary containing the benchmark metrics for each surrogate model.
config (dict) – Configuration dictionary.
- Return type:
None
- Returns:
None
- codes.benchmark.bench_utils.measure_memory_footprint(model, inputs)#
Measure the memory footprint of the model during the forward and backward pass.
- Parameters:
model (torch.nn.Module) – The PyTorch model.
inputs (tuple) – The input data for the model.
conf (dict) – The configuration dictionary.
surr_name (str) – The name of the surrogate model.
- Returns:
A dictionary containing memory footprint measurements.
- Return type:
dict
- codes.benchmark.bench_utils.read_yaml_config(config_path)#
Read the YAML configuration file.
- Parameters:
config_path (str) – Path to the YAML configuration file.
- Returns:
The configuration dictionary.
- Return type:
dict
- codes.benchmark.bench_utils.write_metrics_to_yaml(surr_name, conf, metrics)#
Write the benchmark metrics to a YAML file.
- Parameters:
surr_name (str) – The name of the surrogate model.
conf (dict) – The configuration dictionary.
metrics (dict) – The benchmark metrics.
- Return type:
None
Module contents#
- codes.benchmark.check_benchmark(conf)#
Check whether there are any configuration issues with the benchmark.
- Parameters:
conf (dict) – The configuration dictionary.
- Raises:
FileNotFoundError – If the training ID directory is missing or if the .yaml file is missing.
ValueError – If the configuration is missing required keys or the values do not match the training configuration.
- Return type:
None
- codes.benchmark.check_surrogate(surrogate, conf)#
Check whether the required models for the benchmark are present in the expected directories.
- Parameters:
surrogate (str) – The name of the surrogate model to check.
conf (dict) – The configuration dictionary.
- Raises:
FileNotFoundError – If any required models are missing.
- Return type:
None
- codes.benchmark.clean_metrics(metrics, conf)#
Clean the metrics dictionary to remove problematic entries.
- Parameters:
metrics (dict) – The benchmark metrics.
conf (dict) – The configuration dictionary.
- Returns:
The cleaned metrics dictionary.
- Return type:
dict
- codes.benchmark.compare_MAE(metrics, config)#
Compare the MAE of different surrogate models over the course of training.
- Parameters:
metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.
config (dict) – Configuration dictionary.
- Return type:
None
- Returns:
None
- codes.benchmark.compare_UQ(all_metrics, config)#
Compare the uncertainty quantification (UQ) metrics of different surrogate models.
- Parameters:
all_metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.
config (dict) – Configuration dictionary.
- Return type:
None
- Returns:
None
- codes.benchmark.compare_batchsize(all_metrics, config)#
Compare the batch size training errors of different surrogate models.
- Parameters:
all_metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.
config (dict) – Configuration dictionary.
- Return type:
None
- Returns:
None
- codes.benchmark.compare_dynamic_accuracy(metrics, config)#
Compare the gradients of different surrogate models.
- Parameters:
metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.
config (dict) – Configuration dictionary.
- Return type:
None
- Returns:
None
- codes.benchmark.compare_extrapolation(all_metrics, config)#
Compare the extrapolation errors of different surrogate models.
- Parameters:
all_metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.
config (dict) – Configuration dictionary.
- Return type:
None
- Returns:
None
- codes.benchmark.compare_inference_time(metrics, config, save=True)#
Compare the mean inference time of different surrogate models.
- Parameters:
metrics (dict[str, dict]) – dictionary containing the benchmark metrics for each surrogate model.
config (dict) – Configuration dictionary.
save (bool, optional) – Whether to save the plot. Defaults to True.
- Return type:
None
- Returns:
None
- codes.benchmark.compare_interpolation(all_metrics, config)#
Compare the interpolation errors of different surrogate models.
- Parameters:
all_metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.
config (dict) – Configuration dictionary.
- Return type:
None
- Returns:
None
- codes.benchmark.compare_main_losses(metrics, config)#
Compare the training and test losses of the main models for different surrogate models.
- Parameters:
metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.
config (dict) – Configuration dictionary.
- Return type:
None
- Returns:
None
- codes.benchmark.compare_models(metrics, config)#
- codes.benchmark.compare_relative_errors(metrics, config)#
Compare the relative errors over time for different surrogate models.
- Parameters:
metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.
config (dict) – Configuration dictionary.
- Return type:
None
- Returns:
None
- codes.benchmark.compare_sparse(all_metrics, config)#
Compare the sparse training errors of different surrogate models.
- Parameters:
all_metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.
config (dict) – Configuration dictionary.
- Return type:
None
- Returns:
None
- codes.benchmark.convert_dict_to_scientific_notation(d, precision=8)#
Convert all numerical values in a dictionary to scientific notation.
- Parameters:
d (dict) – The input dictionary.
- Returns:
The dictionary with numerical values in scientific notation.
- Return type:
dict
- codes.benchmark.convert_to_standard_types(data)#
Recursively convert data to standard types that can be serialized to YAML.
- Parameters:
data – The data to convert.
- Returns:
The converted data.
- codes.benchmark.count_trainable_parameters(model)#
Count the number of trainable parameters in the model.
- Parameters:
model (torch.nn.Module) – The PyTorch model.
- Returns:
The number of trainable parameters.
- Return type:
int
- codes.benchmark.discard_numpy_entries(d)#
Recursively remove dictionary entries that contain NumPy arrays.
- Parameters:
d (dict) – The input dictionary.
- Returns:
A new dictionary without entries containing NumPy arrays.
- Return type:
dict
- codes.benchmark.evaluate_UQ(model, surr_name, test_loader, timesteps, conf, labels=None)#
Evaluate the uncertainty quantification (UQ) performance of the surrogate model.
- Parameters:
model – Instance of the surrogate model class.
surr_name (str) – The name of the surrogate model.
test_loader (DataLoader) – The DataLoader object containing the test data.
timesteps (np.ndarray) – The timesteps array.
conf (dict) – The configuration dictionary.
labels (list, optional) – The labels for the chemical species.
- Returns:
A dictionary containing UQ metrics.
- Return type:
dict
- codes.benchmark.evaluate_accuracy(model, surr_name, test_loader, conf, labels=None)#
Evaluate the accuracy of the surrogate model.
- Parameters:
model – Instance of the surrogate model class.
surr_name (str) – The name of the surrogate model.
test_loader (DataLoader) – The DataLoader object containing the test data.
conf (dict) – The configuration dictionary.
labels (list, optional) – The labels for the chemical species.
- Returns:
A dictionary containing accuracy metrics.
- Return type:
dict
- codes.benchmark.evaluate_batchsize(model, surr_name, test_loader, timesteps, conf)#
Evaluate the performance of the surrogate model with different batch sizes.
- Parameters:
model – Instance of the surrogate model class.
surr_name (str) – The name of the surrogate model.
test_loader (DataLoader) – The DataLoader object containing the test data.
timesteps (np.ndarray) – The timesteps array.
conf (dict) – The configuration dictionary.
- Returns:
A dictionary containing batch size training metrics.
- Return type:
dict
- codes.benchmark.evaluate_compute(model, surr_name, test_loader, conf)#
Evaluate the computational resource requirements of the surrogate model.
- Parameters:
model – Instance of the surrogate model class.
surr_name (str) – The name of the surrogate model.
test_loader (DataLoader) – The DataLoader object containing the test data.
conf (dict) – The configuration dictionary.
- Returns:
A dictionary containing model complexity metrics.
- Return type:
dict
- codes.benchmark.evaluate_dynamic_accuracy(model, surr_name, test_loader, conf, species_names=None)#
Evaluate the gradients of the surrogate model.
- Parameters:
model – Instance of the surrogate model class.
surr_name (str) – The name of the surrogate model.
test_loader (DataLoader) – The DataLoader object containing the test data.
conf (dict) – The configuration dictionary.
- Returns:
A dictionary containing gradients metrics.
- Return type:
dict
- codes.benchmark.evaluate_extrapolation(model, surr_name, test_loader, timesteps, conf)#
Evaluate the extrapolation performance of the surrogate model.
- Parameters:
model – Instance of the surrogate model class.
surr_name (str) – The name of the surrogate model.
test_loader (DataLoader) – The DataLoader object containing the test data.
timesteps (np.ndarray) – The timesteps array.
conf (dict) – The configuration dictionary.
- Returns:
A dictionary containing extrapolation metrics.
- Return type:
dict
- codes.benchmark.evaluate_interpolation(model, surr_name, test_loader, timesteps, conf)#
Evaluate the interpolation performance of the surrogate model.
- Parameters:
model – Instance of the surrogate model class.
surr_name (str) – The name of the surrogate model.
test_loader (DataLoader) – The DataLoader object containing the test data.
timesteps (np.ndarray) – The timesteps array.
conf (dict) – The configuration dictionary.
- Returns:
A dictionary containing interpolation metrics.
- Return type:
dict
- codes.benchmark.evaluate_sparse(model, surr_name, test_loader, timesteps, n_train_samples, conf)#
Evaluate the performance of the surrogate model with sparse training data.
- Parameters:
model – Instance of the surrogate model class.
surr_name (str) – The name of the surrogate model.
test_loader (DataLoader) – The DataLoader object containing the test data.
n_train_samples (int) – The number of training samples in the full dataset.
conf (dict) – The configuration dictionary.
- Returns:
A dictionary containing sparse training metrics.
- Return type:
dict
- codes.benchmark.flatten_dict(d, parent_key='', sep=' - ')#
Flatten a nested dictionary.
- Parameters:
d (dict) – The dictionary to flatten.
parent_key (str) – The base key string.
sep (str) – The separator between keys.
- Returns:
Flattened dictionary with composite keys.
- Return type:
dict
- codes.benchmark.format_seconds(seconds)#
Format a duration given in seconds as hh:mm:ss.
- Parameters:
seconds (int) – The duration in seconds.
- Returns:
The formatted duration string.
- Return type:
str
- codes.benchmark.format_time(mean_time, std_time)#
Format mean and std time consistently in ns, µs, ms, or s.
- Parameters:
mean_time – The mean time.
std_time – The standard deviation of the time.
- Returns:
The formatted time string.
- Return type:
str
- codes.benchmark.get_custom_palette(num_colors)#
Returns a list of colors sampled from a custom color palette.
- Parameters:
num_colors (int) – The number of colors needed.
- Returns:
A list of RGBA color tuples.
- Return type:
list
- codes.benchmark.get_model_config(surr_name, config)#
Get the model configuration for a specific surrogate model from the dataset folder. Returns an empty dictionary if the configuration file is not found.
- Parameters:
surr_name (str) – The name of the surrogate model.
config (dict) – The configuration dictionary.
- Returns:
The model configuration dictionary.
- Return type:
dict
- codes.benchmark.get_required_models_list(surrogate, conf)#
Generate a list of required models based on the configuration settings.
- Parameters:
surrogate (str) – The name of the surrogate model.
conf (dict) – The configuration dictionary.
- Returns:
A list of required model names.
- Return type:
list
- codes.benchmark.get_surrogate(surrogate_name)#
Check if the surrogate model exists.
- Parameters:
surrogate_name (str) – The name of the surrogate model.
- Returns:
The surrogate model class if it exists, otherwise None.
- Return type:
SurrogateModel | None
- codes.benchmark.inference_time_bar_plot(surrogates, means, stds, config, save=True)#
Plot the mean inference time with standard deviation for different surrogate models.
- Parameters:
surrogates (List[str]) – List of surrogate model names.
means (List[float]) – List of mean inference times for each surrogate model.
stds (List[float]) – List of standard deviation of inference times for each surrogate model.
config (dict) – Configuration dictionary.
save (bool, optional) – Whether to save the plot. Defaults to True.
- Return type:
None
- Returns:
None
- codes.benchmark.int_ext_sparse(all_metrics, config)#
Function to make one comparative plot of the interpolation, extrapolation, and sparse training errors.
- Parameters:
all_metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.
config (dict) – Configuration dictionary.
- Return type:
None
- Returns:
None
- codes.benchmark.load_model(model, training_id, surr_name, model_identifier)#
Load a trained surrogate model.
- Parameters:
model – Instance of the surrogate model class.
training_id (str) – The training identifier.
surr_name (str) – The name of the surrogate model.
model_identifier (str) – The identifier of the model (e.g., ‘main’).
- Return type:
Module
- Returns:
The loaded surrogate model.
- codes.benchmark.make_comparison_csv(metrics, config)#
Generate a CSV file comparing metrics for different surrogate models.
- Parameters:
metrics (dict) – Dictionary containing the benchmark metrics for each surrogate model.
config (dict) – Configuration dictionary.
- Return type:
None
- Returns:
None
- codes.benchmark.measure_memory_footprint(model, inputs)#
Measure the memory footprint of the model during the forward and backward pass.
- Parameters:
model (torch.nn.Module) – The PyTorch model.
inputs (tuple) – The input data for the model.
conf (dict) – The configuration dictionary.
surr_name (str) – The name of the surrogate model.
- Returns:
A dictionary containing memory footprint measurements.
- Return type:
dict
- codes.benchmark.plot_MAE_comparison(MAEs, labels, config, save=True)#
Plot the MAE for different surrogate models.
- Parameters:
MAE (tuple) – Tuple of accuracy arrays for each surrogate model.
labels (tuple) – Tuple of labels for each surrogate model.
config (dict) – Configuration dictionary.
save (bool) – Whether to save the plot.
- Return type:
None
- codes.benchmark.plot_MAE_comparison_train_duration(MAEs, labels, train_durations, config, save=True)#
Plot the MAE for different surrogate models.
- Parameters:
MAE (tuple) – Tuple of accuracy arrays for each surrogate model.
labels (tuple) – Tuple of labels for each surrogate model.
config (dict) – Configuration dictionary.
save (bool) – Whether to save the plot.
- Return type:
None
- codes.benchmark.plot_average_errors_over_time(surr_name, conf, errors, metrics, timesteps, mode, save=False)#
Plot the errors over time for different modes (interpolation, extrapolation, sparse, batchsize).
- Parameters:
surr_name (str) – The name of the surrogate model.
conf (dict) – The configuration dictionary.
errors (np.ndarray) – Errors array of shape [N_metrics, n_timesteps].
metrics (np.ndarray) – Metrics array of shape [N_metrics].
timesteps (np.ndarray) – Timesteps array.
mode (str) – The mode of evaluation (‘interpolation’, ‘extrapolation’, ‘sparse’, ‘batchsize’).
save (bool, optional) – Whether to save the plot as a file.
- Return type:
None
- codes.benchmark.plot_average_uncertainty_over_time(surr_name, conf, errors_time, preds_std, timesteps, save=False)#
Plot the average uncertainty over time.
- Parameters:
surr_name (str) – The name of the surrogate model.
conf (dict) – The configuration dictionary.
errors_time (np.ndarray) – Prediction errors over time.
preds_std (np.ndarray) – Standard deviation over time of predictions from the ensemble of models.
timesteps (np.ndarray) – Timesteps array.
save (bool, optional) – Whether to save the plot as a file.
- Return type:
None
- codes.benchmark.plot_comparative_dynamic_correlation_heatmaps(gradients, errors, avg_correlations, max_grad, max_err, max_count, config, save=True)#
Plot comparative heatmaps of correlation between gradient and prediction errors for multiple surrogate models.
- Parameters:
gradients (dict[str, np.ndarray]) – Dictionary of gradients from the ensemble of models.
errors (dict[str, np.ndarray]) – Dictionary of prediction errors.
avg_correlations (dict[str, float]) – Dictionary of average correlations between gradients and prediction errors.
max_grad (dict[str, float]) – Dictionary of maximum gradient values for axis scaling across models.
max_err (dict[str, float]) – Dictionary of maximum error values for axis scaling across models.
max_count (dict[str, float]) – Dictionary of maximum count values for heatmap normalization across models.
config (dict) – Configuration dictionary.
save (bool, optional) – Whether to save the plot. Defaults to True.
- Return type:
None
- Returns:
None
- codes.benchmark.plot_comparative_error_correlation_heatmaps(preds_std, errors, avg_correlations, axis_max, max_count, config, save=True)#
Plot comparative heatmaps of correlation between predictive uncertainty and prediction errors for multiple surrogate models.
- Parameters:
preds_std (dict[str, np.ndarray]) – Dictionary of standard deviation of predictions from the ensemble of models.
errors (dict[str, np.ndarray]) – Dictionary of prediction errors.
avg_correlations (dict[str, float]) – Dictionary of average correlations between gradients and prediction errors.
axis_max (dict[str, float]) – Dictionary of maximum values for axis scaling across models.
max_count (dict[str, float]) – Dictionary of maximum count values for heatmap normalization across models.
config (dict) – Configuration dictionary.
save (bool, optional) – Whether to save the plot. Defaults to True.
- Return type:
None
- Returns:
None
- codes.benchmark.plot_dynamic_correlation(surr_name, conf, gradients, errors, save=False)#
Plot the correlation between the gradients of the data and the prediction errors.
- Parameters:
surr_name (str) – The name of the surrogate model.
conf (dict) – The configuration dictionary.
gradients (np.ndarray) – The gradients of the data.
errors (np.ndarray) – The prediction errors.
save (bool) – Whether to save the plot.
- codes.benchmark.plot_dynamic_correlation_heatmap(surr_name, conf, preds_std, errors, average_correlation, save=False, threshold_factor=0.0001, xcut_percent=0.003)#
Plot the correlation between predictive uncertainty and prediction errors using a heatmap.
- Parameters:
surr_name (str) – The name of the surrogate model.
conf (dict) – The configuration dictionary.
preds_std (np.ndarray) – Standard deviation of predictions from the ensemble of models.
errors (np.ndarray) – Prediction errors.
average_correlation (float) – The average correlation between gradients and prediction errors (pearson correlation).
save (bool, optional) – Whether to save the plot as a file.
threshold_factor (float, optional) – Fraction of max value below which cells are set to 0. Default is 5e-5.
cutoff_percent (float, optional) – The percentage of total counts to include in the heatmap. Default is 0.95.
- Return type:
None
- codes.benchmark.plot_error_correlation_heatmap(surr_name, conf, preds_std, errors, average_correlation, save=False, threshold_factor=0.01)#
Plot the correlation between predictive uncertainty and prediction errors using a heatmap.
- Parameters:
surr_name (str) – The name of the surrogate model.
conf (dict) – The configuration dictionary.
preds_std (np.ndarray) – Standard deviation of predictions from the ensemble of models.
errors (np.ndarray) – Prediction errors.
average_correlation (float) – The average correlation between gradients and prediction errors (pearson correlation).
save (bool, optional) – Whether to save the plot as a file.
threshold_factor (float, optional) – Fraction of max value below which cells are set to 0. Default is 0.001.
- Return type:
None
- codes.benchmark.plot_error_distribution_comparative(errors, conf, save=True)#
Plot the comparative distribution of errors for each surrogate model as a smoothed histogram plot.
- Parameters:
conf (dict) – The configuration dictionary.
errors (dict) – Dictionary containing numpy arrays of shape [num_samples, num_timesteps, num_chemicals] for each model.
save (bool, optional) – Whether to save the plot as a file.
- Return type:
None
- codes.benchmark.plot_error_distribution_per_chemical(surr_name, conf, errors, chemical_names=None, num_chemicals=10, save=True)#
Plot the distribution of errors for each chemical as a smoothed histogram plot.
- Parameters:
surr_name (str) – The name of the surrogate model.
conf (dict) – The configuration dictionary.
errors (np.ndarray) – Errors array of shape [num_samples, num_timesteps, num_chemicals].
chemical_names (list, optional) – List of chemical names for labeling the lines.
num_chemicals (int, optional) – Number of chemicals to plot. Default is 10.
save (bool, optional) – Whether to save the plot as a file.
- Return type:
None
- codes.benchmark.plot_example_predictions_with_uncertainty(surr_name, conf, preds_mean, preds_std, targets, timesteps, example_idx=0, num_chemicals=100, labels=None, save=False)#
Plot example predictions with uncertainty.
- Parameters:
surr_name (str) – The name of the surrogate model.
conf (dict) – The configuration dictionary.
preds_mean (np.ndarray) – Mean predictions from the ensemble of models.
preds_std (np.ndarray) – Standard deviation of predictions from the ensemble of models.
targets (np.ndarray) – True targets.
timesteps (np.ndarray) – Timesteps array.
example_idx (int, optional) – Index of the example to plot. Default is 0.
num_chemicals (int, optional) – Number of chemicals to plot. Default is 100.
labels (list, optional) – List of labels for the chemicals.
save (bool, optional) – Whether to save the plot as a file.
- Return type:
None
- codes.benchmark.plot_generalization_error_comparison(surrogates, metrics_list, model_errors_list, xlabel, filename, config, save=True, xlog=False)#
Plot the generalization errors of different surrogate models.
- Parameters:
surrogates (list) – List of surrogate model names.
metrics_list (list[np.array]) – List of numpy arrays containing the metrics for each surrogate model.
model_errors_list (list[np.array]) – List of numpy arrays containing the errors for each surrogate model.
xlabel (str) – Label for the x-axis.
filename (str) – Filename to save the plot.
config (dict) – Configuration dictionary.
save (bool) – Whether to save the plot.
xlog (bool) – Whether to use a log scale for the x-axis.
- Return type:
None
- Returns:
None
- codes.benchmark.plot_generalization_errors(surr_name, conf, metrics, model_errors, mode, save=False)#
Plot the generalization errors of a model for various metrics.
- Parameters:
surr_name (str) – The name of the surrogate model.
conf (dict) – The configuration dictionary.
metrics (np.ndarray) – The metrics (e.g., intervals, cutoffs, batch sizes, number of training samples).
model_errors (np.ndarray) – The model errors.
mode (str) – The mode of generalization (“interpolation”, “extrapolation”, “sparse”, “batchsize”).
save (bool) – Whether to save the plot.
- Return type:
None
- Returns:
None
- codes.benchmark.plot_loss_comparison(train_losses, test_losses, labels, config, save=True)#
Plot the training and test losses for different surrogate models.
- Parameters:
train_losses (tuple) – Tuple of training loss arrays for each surrogate model.
test_losses (tuple) – Tuple of test loss arrays for each surrogate model.
labels (tuple) – Tuple of labels for each surrogate model.
config (dict) – Configuration dictionary.
save (bool) – Whether to save the plot.
- Return type:
None
- Returns:
None
- codes.benchmark.plot_losses(loss_histories, labels, title='Losses', save=False, conf=None, surr_name=None, mode='main')#
Plot the loss trajectories for the training of multiple models.
- Parameters:
loss_histories (
tuple
[array
,...
]) – List of loss history arrays.labels (
tuple
[str
,...
]) – List of labels for each loss history.title (
str
) – Title of the plot.save (
bool
) – Whether to save the plot as an image file.conf (
Optional
[dict
]) – The configuration dictionary.surr_name (
Optional
[str
]) – The name of the surrogate model.mode (
str
) – The mode of the training.
- Return type:
None
- codes.benchmark.plot_relative_errors(mean_errors, median_errors, timesteps, config, save=True)#
Plot the relative errors over time for different surrogate models.
- Parameters:
mean_errors (dict) – dictionary containing the mean relative errors for each surrogate model.
median_errors (dict) – dictionary containing the median relative errors for each surrogate model.
timesteps (np.ndarray) – Array of timesteps.
config (dict) – Configuration dictionary.
save (bool) – Whether to save the plot.
- Return type:
None
- Returns:
None
- codes.benchmark.plot_relative_errors_over_time(surr_name, conf, relative_errors, title, save=False)#
Plot the mean and median relative errors over time with shaded regions for the 50th, 90th, and 99th percentiles.
- Parameters:
surr_name (str) – The name of the surrogate model.
conf (dict) – The configuration dictionary.
relative_errors (np.ndarray) – The relative errors of the model.
title (str) – The title of the plot.
save (bool) – Whether to save the plot.
- Return type:
None
- codes.benchmark.plot_surr_losses(model, surr_name, conf, timesteps)#
Plot the training and test losses for the surrogate model.
- Parameters:
model – Instance of the surrogate model class.
surr_name (str) – The name of the surrogate model.
conf (dict) – The configuration dictionary.
timesteps (np.ndarray) – The timesteps array.
- Return type:
None
- codes.benchmark.plot_uncertainty_over_time_comparison(uncertainties, absolute_errors, timesteps, config, save=True)#
Plot the uncertainty over time for different surrogate models.
- Parameters:
uncertainties (dict) – Dictionary containing the uncertainties for each surrogate model.
absolute_errors (dict) – Dictionary containing the absolute errors for each surrogate model.
timesteps (np.ndarray) – Array of timesteps.
config (dict) – Configuration dictionary.
save (bool) – Whether to save the plot.
- Return type:
None
- Returns:
None
- codes.benchmark.plot_uncertainty_vs_errors(surr_name, conf, preds_std, errors, save=False)#
Plot the correlation between predictive uncertainty and prediction errors.
- Parameters:
surr_name (str) – The name of the surrogate model.
conf (dict) – The configuration dictionary.
preds_std (np.ndarray) – Standard deviation of predictions from the ensemble of models.
errors (np.ndarray) – Prediction errors.
save (bool, optional) – Whether to save the plot as a file.
- Return type:
None
- codes.benchmark.read_yaml_config(config_path)#
Read the YAML configuration file.
- Parameters:
config_path (str) – Path to the YAML configuration file.
- Returns:
The configuration dictionary.
- Return type:
dict
- codes.benchmark.rel_errors_and_uq(metrics, config, save=True)#
Create a figure with two subplots: relative errors over time and uncertainty over time for different surrogate models.
- Parameters:
metrics (dict) – Dictionary containing the benchmark metrics for each surrogate model.
config (dict) – Configuration dictionary.
save (bool) – Whether to save the plot.
- Return type:
None
- Returns:
None
- codes.benchmark.run_benchmark(surr_name, surrogate_class, conf)#
Run benchmarks for a given surrogate model.
- Parameters:
surr_name (str) – The name of the surrogate model to benchmark.
surrogate_class – The class of the surrogate model.
conf (dict) – The configuration dictionary.
- Returns:
A dictionary containing all relevant metrics for the given model.
- Return type:
dict
- codes.benchmark.save_plot(plt, filename, conf, surr_name='', dpi=300, base_dir='plots', increase_count=False)#
Save the plot to a file, creating necessary directories if they don’t exist.
- Parameters:
plt (matplotlib.pyplot) – The plot object to save.
filename (str) – The desired filename for the plot.
conf (dict) – The configuration dictionary.
surr_name (str) – The name of the surrogate model.
dpi (int) – The resolution of the saved plot.
base_dir (str, optional) – The base directory where plots will be saved. Default is “plots”.
increase_count (bool, optional) – Whether to increment the filename count if a file already exists. Default is True.
- Raises:
ValueError – If the configuration dictionary does not contain the required keys.
- Return type:
None
- codes.benchmark.save_plot_counter(filename, directory, increase_count=True)#
Save a plot with an incremented filename if a file with the same name already exists.
- Parameters:
filename (str) – The desired filename for the plot.
directory (str) – The directory to save the plot in.
increase_count (bool, optional) – Whether to increment the filename count if a file already exists. Default is True.
- Returns:
The full path to the saved plot.
- Return type:
str
- codes.benchmark.tabular_comparison(all_metrics, config)#
Compare the metrics of different surrogate models in a tabular format.
- Parameters:
all_metrics (dict) – dictionary containing the benchmark metrics for each surrogate model.
config (dict) – Configuration dictionary.
- Return type:
None
- Returns:
None
- codes.benchmark.time_inference(model, surr_name, test_loader, conf, n_test_samples, n_runs=5)#
Time the inference of the surrogate model.
- Parameters:
model – Instance of the surrogate model class.
surr_name (str) – The name of the surrogate model.
test_loader (DataLoader) – The DataLoader object containing the test data.
timesteps (np.ndarray) – The timesteps array.
conf (dict) – The configuration dictionary.
n_test_samples (int) – The number of test samples.
n_runs (int, optional) – Number of times to run the inference for timing.
- Returns:
A dictionary containing timing metrics.
- Return type:
dict
- codes.benchmark.write_metrics_to_yaml(surr_name, conf, metrics)#
Write the benchmark metrics to a YAML file.
- Parameters:
surr_name (str) – The name of the surrogate model.
conf (dict) – The configuration dictionary.
metrics (dict) – The benchmark metrics.
- Return type:
None