Metrics

class InnerEye.ML.metrics.InferenceMetrics[source]: Defined purely to serve as a superclass.

class InnerEye.ML.metrics.InferenceMetricsForClassification(metrics: MetricsDict)[source]

Stores a dictionary mapping from epoch number to the metrics that were achieved in that epoch.

metrics: MetricsDict

class InnerEye.ML.metrics.InferenceMetricsForSegmentation(execution_mode: ModelExecutionMode, metrics: float)[source]

Stores metrics for segmentation models, per execution mode and epoch.

execution_mode: ModelExecutionMode

get_metrics_log_key() → str[source]: Gets a string name for logging the metrics specific to the execution mode (train, val, test) :return:

log_metrics(run_context: Optional[Run] = None) → None[source]

Log metrics for each epoch to the provided runs logs, or the current run context if None provided

Parameters:: run_context – Run for which to log the metrics to, use the current run context if None provided
Returns:

metrics: float

InnerEye.ML.metrics.add_average_foreground_dice(metrics: MetricsDict) → None[source]

If the given metrics dictionary contains an entry for Dice score, and only one value for the Dice score per class, then add an average Dice score for all foreground classes to the metrics dictionary (modified in place).

Parameters:: metrics – The object that holds metrics. The average Dice score will be written back into this object.

InnerEye.ML.metrics.calculate_metrics_per_class(segmentation: ndarray, ground_truth: ndarray, ground_truth_ids: List[str], voxel_spacing: Tuple[float, float, float], patient_id: Optional[int] = None) → MetricsDict[source]

Calculate the dice for all foreground structures (the background class is completely ignored). Returns a MetricsDict with metrics for each of the foreground structures. Metrics are NaN if both ground truth and prediction are all zero for a class. If first element of a ground truth image channel is NaN, the image is flagged as NaN and not use.

Parameters:

ground_truth_ids – The names of all foreground classes.
segmentation – predictions multi-value array with dimensions: [Z x Y x X]
ground_truth – ground truth binary array with dimensions: [C x Z x Y x X].
voxel_spacing – voxel_spacing in 3D Z x Y x X
patient_id – for logging

InnerEye.ML.metrics.compute_dice_across_patches(segmentation: Tensor, ground_truth: Tensor, allow_multiple_classes_for_each_pixel: bool = False) → Tensor[source]

Computes the Dice scores for all classes across all patches in the arguments.

Parameters:

segmentation – Tensor containing class ids predicted by a model.
ground_truth – One-hot encoded torch tensor containing ground-truth label ids.
allow_multiple_classes_for_each_pixel – If set to False, ground-truth tensor has to contain only one foreground label for each pixel.

Returns:

A torch tensor of size (Patches, Classes) with the Dice scores. Dice scores are computed for all classes including the background class at index 0.

InnerEye.ML.metrics.compute_scalar_metrics(metrics_dict: ScalarMetricsDict, subject_ids: Sequence[str], model_output: Tensor, labels: Tensor, loss_type: ScalarLoss = ScalarLoss.BinaryCrossEntropyWithLogits) → None[source]

Computes various metrics for a binary classification task from real-valued model output and a label vector, and stores them in the given metrics_dict. The model output is assumed to be in the range between 0 and 1, a value larger than 0.5 indicates a prediction of class 1. The label vector is expected to contain class indices 0 and 1 only. Metrics for each model output channel will be isolated, and a non-default hue for each model output channel is expected, and must exist in the provided metrics_dict. The Default hue is used for single model outputs.

Parameters:

metrics_dict – An object that holds all metrics. It will be updated in-place.
subject_ids – Subject ids for the model output and labels.
model_output – A tensor containing model outputs.
labels – A tensor containing class labels.
loss_type – The type of loss that the model uses. This is required to optionally convert 2-dim model output to probabilities.

InnerEye.ML.metrics.store_epoch_metrics(metrics: Dict[str, float], epoch: int, file_logger: DataframeLogger) → None[source]

Writes all metrics (apart from ones that measure run time) into a CSV file, with an additional columns for epoch number.

Parameters:

file_logger – An instance of DataframeLogger, for logging results to csv.
epoch – The epoch corresponding to the results.
metrics – The metrics of the specified epoch, averaged along its batches.

InnerEye.ML.metrics.surface_distance(seg: Image, reference_segmentation: Image) → float[source]

Symmetric surface distances taking into account the image spacing https://github.com/InsightSoftwareConsortium/SimpleITK-Notebooks/blob/master/Python/34_Segmentation_Evaluation.ipynb

Parameters:

seg – mask 1
reference_segmentation – mask 2

Returns:

mean distance

class InnerEye.ML.metrics_dict.DataframeLogger(csv_path: Union[str, Path, IO], fixed_columns: Optional[Dict[str, Any]] = None)[source]

Single DataFrame logger for logging to CSV file

add_record(record: Dict[str, Any]) → None[source]

flush(log_info: bool = False) → None[source]

Save the internal records to a csv file.

Parameters:: log_info – If true, write the final dataframe also to logging.info.

class InnerEye.ML.metrics_dict.Hue(name: str, values: ~typing.Dict[str, ~typing.List[~typing.Union[float, int]]] = <factory>, predictions: ~typing.List[~numpy.ndarray] = <factory>, labels: ~typing.List[~numpy.ndarray] = <factory>, subject_ids: ~typing.List[str] = <factory>)[source]

Dataclass to encapsulate hue specific data related for metrics computation.

add_predictions(subject_ids: Sequence[str], predictions: ndarray, labels: ndarray) → None[source]

Adds predictions and labels for later computing the area under the ROC curve.

Parameters:

subject_ids – Subject ids associated with the predictions and labels.
predictions – A numpy array with model predictions, of size [N x C] for N samples in C classes, or size [N x 1] or size [N] for binary.
labels – A numpy array with labels, of size [N x C] for N samples in C classes, or size [N x 1] or size [N] for binary.

enumerate_single_values() → Iterable[Tuple[str, float]][source]: Returns an iterator that contains all (metric name, metric value) tuples that are stored in the present object. The method assumes that there is exactly 1 metric value stored per name, and throws a ValueError if that is not the case. :return: An iterator with (metric name, metric value) pairs.

get_labels() → ndarray[source]: Return a concatenated copy of the roc labels stored internally.

get_predictions() → ndarray[source]: Return a concatenated copy of the roc predictions stored internally.

get_predictions_and_labels_per_subject() → List[PredictionEntry[float]][source]: Gets the per-subject predictions that are stored in the present object.

property has_prediction_entries: bool: Returns True if the present object stores any entries for computing the Area Under Roc Curve metric.

labels: List[ndarray]

name: str

predictions: List[ndarray]

subject_ids: List[str]

values: Dict[str, List[Union[float, int]]]

class InnerEye.ML.metrics_dict.MetricsDict(hues: Optional[List[str]] = None, is_classification_metrics: bool = True)[source]

This class helps aggregate an arbitrary number of metrics across multiple batches or multiple samples. Metrics are identified by a string name. Metrics can have further hues which are isolated metrics records, and can be used for cases such as different anatomical structures, where we might want to maintain separate metrics for each structure, to perform independent aggregations.

DATAFRAME_COLUMNS = ['prediction_target', 'metrics']

DEFAULT_HUE_KEY = 'Default'

add_diagnostics(name: str, value: Any) → None[source]

Adds a diagnostic value to the present object. Multiple diagnostics can be stored per unique value of name, the values get concatenated.

Parameters:

name – The name of the diagnostic value to store.
value – The value to store.

add_metric(metric_name: Union[str, MetricType], metric_value: Union[float, int], skip_nan_when_averaging: bool = False, hue: str = 'Default') → None[source]

Adds values for a single metric to the present object, when the metric value is a scalar.

Parameters:

metric_name – The name of the metric to add. This can be a string or a value in the MetricType enum.
metric_value – The values of the metric, as a float or integer.
skip_nan_when_averaging – If True, averaging this metric will skip any NaN (not a number) values. If False, NaN will propagate through the mean computation.
hue – The hue for which this record belongs to, default hue will be used if None provided.

add_predictions(subject_ids: Sequence[str], predictions: ndarray, labels: ndarray, hue: str = 'Default') → None[source]

Adds predictions and labels for later computing the area under the ROC curve.

Parameters:

subject_ids – Subject ids associated with the predictions and labels.
predictions – A numpy array with model predictions, of size [N x C] for N samples in C classes, or size [N x 1] or size [N] for binary.
labels – A numpy array with labels, of size [N x C] for N samples in C classes, or size [N x 1] or size [N] for binary.
hue – The hue this prediction belongs to, default hue will be used if None provided.

average(add_metrics_from_entries: bool = False, across_hues: bool = True) → MetricsDict[source]

Returns a MetricsDict object that only contains the per-metric averages (arithmetic mean) from the present object. Computing the average will respect the skip_nan_when_averaging value that has been provided when adding the metric.

Parameters:

add_metrics_from_entries – average existing metrics in the dict.
across_hues – If True then same metric types will be averaged regardless of hues, otherwise separate averages for each metric type for each hue will be computed, Default is True.

Returns:

A MetricsDict object with a single-item list for each of the metrics.

delete_hue(hue: str) → None[source]

Removes all data stored for the given hue from the present object.

Parameters:: hue – The hue to remove.

delete_metric(metric_name: Union[str, MetricType], hue: str = 'Default') → None[source]

Deletes all values that are stored for a given metric from the present object.

Parameters:

metric_name – The name of the metric to add. This can be a string or a value in the MetricType enum.
hue – The hue for which this record belongs to, default hue will be used if None provided.

enumerate_single_values(hue: Optional[str] = None) → Iterable[Tuple[str, str, float]][source]

Returns an iterator that contains all (hue name, metric name, metric values) tuples that are stored in the present object. This method assumes that for each hue/metric combination there is exactly 1 value, and it throws an exception if that is more than 1 value.

Parameters:: hue – The hue to restrict the values, otherwise all values will be used if set to None.
Returns:: An iterator with (hue name, metric name, metric values) pairs.

enumerate_single_values_groupwise() → Iterable[Tuple[str, Iterable[Tuple[str, float]]]][source]: Returns an iterator that contains (hue name, metric_name_and_value) tuples that are stored in the present object. The second tuple element is again an iterator that returns all metric name and value tuples that are stored for that specific hue. This method assumes that for each hue/metric combination there is exactly 1 value, and it throws an exception if that is more than 1 value. :return: An iterator with (hue name, metric_name_and_value) pairs.

get_accuracy_at05(hue: str = 'Default') → float[source]: Returns the binary classification accuracy at threshold 0.5

get_cross_entropy(hue: str = 'Default') → float[source]

Computes the binary cross entropy from the entries that were supplied in the add_roc_entries method.

Parameters:: hue – The hue to restrict the values used for computation, otherwise all values will be used.
Returns:: The cross entropy score.

get_hue_names(include_default: bool = True) → List[str][source]

Returns all of the hues supported by this metrics dict

Parameters:: include_default – Include the default hue if True, otherwise exclude the default hue.

get_labels(hue: str = 'Default') → ndarray[source]

Return a concatenated copy of the roc labels stored internally.

Parameters:: hue – The hue to restrict the values, otherwise all values will be used.
Returns:: roc labels as np array

get_mean_absolute_error(hue: str = 'Default') → float[source]

Get the mean absolute error.

Parameters:: hue – The hue to restrict the values used for computation, otherwise all values will be used.
Returns:: Mean absolute error.

get_mean_squared_error(hue: str = 'Default') → float[source]

Get the mean squared error.

Parameters:: hue – The hue to restrict the values used for computation, otherwise all values will be used.
Returns:: Mean squared error

get_metrics_at_optimal_cutoff(hue: str = 'Default') → Tuple[source]

Computes the ROC to find the optimal cut-off i.e. the probability threshold for which the difference between true positive rate and false positive rate is smallest. Then, computes the false positive rate, false negative rate and accuracy at this threshold (i.e. when the predicted probability is higher than the threshold the predicted label is 1 otherwise 0).

Parameters:: hue – The hue to restrict the values used for computation, otherwise all values will be used.
Returns:: Tuple(optimal_threshold, false positive rate, false negative rate, accuracy)

classmethod get_optimal_idx(fpr: ndarray, tpr: ndarray) → ndarray[source]: Given a list of FPR and TPR values corresponding to different thresholds, compute the index which corresponds to the optimal threshold.

get_pr_auc(hue: str = 'Default') → float[source]

Computes the Area Under the Precision Recall Curve, from the entries that were supplied in the add_roc_entries method.

Parameters:: hue – The hue to restrict the values used for computation, otherwise all values will be used.
Returns:: The PR AUC score, or np.nan if no entries are available in the present object.

get_predictions(hue: str = 'Default') → ndarray[source]

Return a concatenated copy of the roc predictions stored internally.

Parameters:: hue – The hue to restrict the values, otherwise all values will be used.
Returns:: concatenated roc predictions as np array

get_predictions_and_labels_per_subject(hue: str = 'Default') → List[PredictionEntry[float]][source]

Gets the per-subject labels and predictions that are stored in the present object.

Parameters:: hue – The hue to restrict the values, otherwise the default hue will be used.
Returns:: List of per-subject labels and predictions

get_r2_score(hue: str = 'Default') → float[source]

Get the R2 score.

Parameters:: hue – The hue to restrict the values used for computation, otherwise all values will be used.
Returns:: R2 score

get_roc_auc(hue: str = 'Default') → float[source]

Computes the Area Under the ROC curve, from the entries that were supplied in the add_roc_entries method.

Parameters:: hue – The hue to restrict the values used for computation, otherwise all values will be used.
Returns:: The AUC score, or np.nan if no entries are available in the present object.

get_single_metric(metric_name: Union[str, MetricType], hue: str = 'Default') → Union[float, int][source]

Gets the value stored for the given metric. The method assumes that there is a single value stored for the metric, and raises a ValueError if that is not the case.

Parameters:

metric_name – The name of the metric to retrieve.
hue – The hue to retrieve the metric from.

Returns:

has_prediction_entries(hue: str = 'Default') → bool[source]

Returns True if the present object stores any entries for computing the Area Under Roc Curve metric.

Parameters:: hue – will be used to check a particular hue otherwise default hue will be used.
Returns:: True if entries exist. False otherwise.

num_entries(hue: str = 'Default') → Dict[str, int][source]

Gets the number of values that are stored for each individual metric.

Parameters:: hue – The hue to count entries for, otherwise all entries will be counted.
Returns:: A dictionary mapping from metric name to number of values stored.

subject_ids(hue: str = 'Default') → List[str][source]

Return the subject ids that have metrics associated with them in this dictionary.

Parameters:: hue – If provided then subject ids belonging to this hue only will be returned. Otherwise subject ids for the default hue will be returned.

to_data_frame() → DataFrame[source]: Creates a data frame representation of the metrics dict in the format with the Hue name as a column and a string representation of all metrics for that hue as a second column.

to_string(tabulate: bool = True) → str[source]

Creates a multi-line human readable string from the given metrics.

Parameters:: tabulate – If True then create a pretty printable table string.
Returns:: Formatted metrics string

values(hue: str = 'Default') → Dict[str, Any][source]

Returns values held currently in the dict

Parameters:: hue – will be used to restrict values for the provided hue otherwise values in the default hue will be returned.
Returns:: Dictionary of values for this object.

class InnerEye.ML.metrics_dict.PredictionEntry(*args, **kwds)[source]

labels: T

predictions: T

subject_id: str

class InnerEye.ML.metrics_dict.ScalarMetricsDict(hues: Optional[List[str]] = None, is_classification_metrics: bool = True)[source]

Specialization of the MetricsDict with Classification related functions.

static aggregate_and_save_execution_mode_metrics(metrics: Dict[ModelExecutionMode, Dict[Union[int, str], ScalarMetricsDict]], data_frame_logger: DataframeLogger, log_info: bool = True) → None[source]

Given metrics dicts for execution modes and epochs, compute the aggregate metrics that are computed from the per-subject predictions. The metrics are written to the dataframe logger with the string labels (column names) taken from the MetricType enum.

Parameters:

metrics – Mapping between epoch and subject level metrics
data_frame_logger – DataFrame logger to write to and flush
log_info – If True then log results as an INFO string to the default logger also.

Returns:

binary_classification_accuracy(hue: str = 'Default') → float[source]

Parameters:: hue – The hue to restrict the values, otherwise all values will be used.
Returns:: binary classification accuracy

diagnostics: Dict[str, List[Any]]

hues: OrderedDict[str, Hue]

static load_execution_mode_metrics_from_df(df: DataFrame, is_classification_metrics: bool) → Dict[ModelExecutionMode, Dict[Union[int, str], ScalarMetricsDict]][source]

Helper function to create BinaryClassificationMetricsDict grouped by ModelExecutionMode and epoch from a given dataframe. The following columns must exist in the provided data frame:

LoggingColumns.DataSplit
LoggingColumns.Epoch

Parameters:

df – DataFrame to use for creating the metrics dict.
is_classification_metrics – If the current metrics are for classification or not.

row_labels: List[str]

skip_nan_when_averaging: Dict[str, bool]

store_metrics_per_subject(df_logger: DataframeLogger, mode: ModelExecutionMode, epoch: Union[int, str], cross_validation_split_index: int = - 1) → None[source]

Store metrics using the provided df_logger at subject level for classification models.

Parameters:

df_logger – A data frame logger to use to write the metrics to disk.
mode – Model execution mode these metrics belong to.
cross_validation_split_index – cross validation split index for the epoch if performing cross val

Returns:

class InnerEye.ML.metrics_dict.SequenceMetricsDict(hues: Optional[List[str]] = None, is_classification_metrics: bool = True)[source]

Specialization of the MetricsDict with Sequence related functions.

static create(is_classification_model: bool, sequence_target_positions: List[int]) → SequenceMetricsDict[source]

diagnostics: Dict[str, List[Any]]

static get_hue_name_from_target_index(target_index: int) → str[source]: Creates a metrics hue name for sequence models, from a target index. For a sequence model that predicts at index 7, the hue name would be “Seq_pos 07”

static get_target_index_from_hue_name(hue_name: str) → int[source]

Extracts a sequence target index from a metrics hue name. For example, from metrics hue “Seq_pos 07”, it would return 7.

Parameters:: hue_name – hue name containing sequence target index

hues: OrderedDict[str, Hue]

row_labels: List[str]

skip_nan_when_averaging: Dict[str, bool]

InnerEye.ML.metrics_dict.average_metric_values(values: List[float], skip_nan_when_averaging: bool) → float[source]

Returns the average (arithmetic mean) of the values provided. If skip_nan_when_averaging is True, the mean will be computed without any possible NaN values in the list.

Parameters:

values – The individual values that should be averaged.
skip_nan_when_averaging – If True, compute mean with any NaN values. If False, any NaN value present in the argument will make the function return NaN.

Returns:

The average of the provided values. If the argument is an empty list, NaN will be returned.

InnerEye.ML.metrics_dict.get_column_name_for_logging(metric_name: Union[str, MetricType], hue_name: Optional[str] = None) → str[source]

Computes the column name that should be used when logging a metric to disk. Raises a value error when no column name has yet been defined.

Parameters:

metric_name – The name of the metric.
hue_name – If provided will be used as a prefix hue_name/column_name

InnerEye.ML.metrics_dict.get_metric_name_with_hue_prefix(metric_name: str, hue_name: Optional[str] = None) → str[source]: If hue_name is provided and is not equal to the default hue then it will be used as a prefix hue_name/column_name, otherwise metric_name will be returned.