Architectures

Base Model

class InnerEye.ML.models.architectures.base_model.BaseSegmentationModel(name: str, input_channels: int, crop_size_constraints: Optional[CropSizeConstraints] = None)[source]

Base neural network segmentation model.

abstract forward(input: Any) → Any[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

generate_model_summary(crop_size: Optional[Tuple[int, int, int]] = None, log_summaries_to_files: bool = False) → None[source]

Stores a model summary, containing information about layers, memory consumption and runtime in the model.summary field. When called again with the same crop_size, the summary is not created again.

Parameters:

crop_size – The crop size for which the summary should be created. If not provided, the minimum allowed crop size is used.
log_summaries_to_files – whether to write the summary to a file

get_all_child_layers() → List[Module][source]

get_output_shape(input_shape: Union[Tuple[int, int], Tuple[int, int, int]]) → Tuple[int, ...][source]

Computes model’s output tensor shape for given input tensor shape. The argument is expected to be either a 2-tuple or a 3-tuple. A batch dimension (1) and the number of channels are added as the first dimensions. The result tuple has batch and channel dimension stripped off.

Parameters:: input_shape – A tuple (2D or 3D) representing incoming tensor shape.

partition_model(devices: Optional[List[device]] = None) → None[source]: A method to partition a neural network model across multiple devices. If no list of devices is given, use all available GPU devices.

training: bool

validate_crop_size(crop_size: Tuple[int, int, int], message_prefix: Optional[str] = None) → None[source]

Checks if the given crop size is a valid crop size for the present model. If it is not valid, throw a ValueError.

Parameters:

crop_size – The crop size that should be checked.
message_prefix – A string prefix for the error message if the crop size is found to be invalid.

class InnerEye.ML.models.architectures.base_model.CropSizeConstraints(multiple_of: Optional[Union[int, Tuple[int, int, int], Iterable]] = None, minimum_size: Optional[Union[int, Tuple[int, int, int], Iterable]] = None, num_dimensions: int = 3)[source]

restrict_crop_size_to_image(image_shape: Tuple[int, int, int], crop_size: Tuple[int, int, int], stride_size: Tuple[int, int, int]) → Tuple[Tuple[int, int, int], Tuple[int, int, int]][source]

Computes an adjusted crop and stride size for cases where the image is smaller than the chosen crop size (at test time). The new crop size will be the largest multiple of self.multiple_of that fits into the image_shape. The stride size will attempt to maintain the stride-to-crop ratio before adjustment.

Parameters:

image_shape – The shape of the image to process.
crop_size – The present test crop size.
stride_size – The present inference stride size.

Returns:

A tuple of (crop_size, stride_size)

validate(crop_size: Tuple[int, int, int], message_prefix: Optional[str] = None) → None[source]

Checks if the given crop size is a valid crop size for the present model. If it is not valid, throw a ValueError.

Parameters:

crop_size – The crop size that should be checked.
message_prefix – A string prefix for the error message if the crop size is found to be invalid.

Returns:

U-Nets

class InnerEye.ML.models.architectures.unet_3d.UNet3D(input_image_channels: int, initial_feature_channels: int, num_classes: int, kernel_size: Union[int, Tuple[int, int, int], Iterable], name: str = 'UNet3D', num_downsampling_paths: int = 4, downsampling_factor: Union[int, Tuple[int, int, int], Iterable] = 2, downsampling_dilation: Union[int, Tuple[int, int, int], Iterable] = (1, 1, 1), padding_mode: PaddingMode = PaddingMode.Zero)[source]

Implementation of 3D UNet model. Ref: Ronneberger et al. U-Net: Convolutional Networks for Biomedical Image Segmentation The implementation differs from the original architecture in terms of the following: 1) Pooling layers are replaced with strided convolutions to learn the downsampling operations 2) Upsampling layers have spatial support larger than 2x2x2 to learn interpolation as good as linear upsampling. 3) Non-linear activation units are placed in between deconv and conv operations to avoid two redundant linear operations one after another. 4) Support for more downsampling operations to capture larger image context and improve the performance. The network has num_downsampling_paths downsampling steps on the encoding side and same number upsampling steps on the decoding side.

Parameters:

num_downsampling_paths – Number of downsampling paths used in Unet model (default 4 image level are used)
num_classes – Number of output segmentation classes
kernel_size – Spatial support of convolution kernels used in Unet model

class UNetDecodeBlock(channels: ~typing.Tuple[int, int], upsample_kernel_size: ~typing.Union[int, ~typing.Tuple[int, int, int], ~typing.Iterable], upsampling_stride: ~typing.Union[int, ~typing.Tuple[int, int, int], ~typing.Iterable] = 2, padding_mode: ~InnerEye.ML.config.PaddingMode = PaddingMode.Zero, activation: ~typing.Callable = <class 'torch.nn.modules.activation.ReLU'>, depth: ~typing.Optional[int] = None)[source]

Implements upsampling block for UNet architecture. The operations carried out on the input tensor are 1) Upsampling via strided convolutions 2) Concatenating the skip connection tensor 3) Two convolution layers

Parameters:

channels – A tuple containing the number of input and output channels
upsample_kernel_size – Spatial support of upsampling kernels. If an integer is provided, the same value will be repeated for all three dimensions. For non-cubic kernels please pass a list or tuple with three elements.
upsampling_stride – Upsamling factor used in deconvolutional layer. Similar to the upsample_kernel_size parameter, if an integer is passed, the same upsampling factor will be used for all three dimensions.
activation – Linear/Non-linear activation function that is used after linear deconv/conv mappings.
depth – The depth inside the UNet at which the layer operates. This is only for diagnostic purposes.

forward(x: Any) → Any[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

training: bool

class UNetEncodeBlock(channels: ~typing.Tuple[int, int], kernel_size: ~typing.Union[int, ~typing.Tuple[int, int, int], ~typing.Iterable], downsampling_stride: ~typing.Union[int, ~typing.Tuple[int, int, int], ~typing.Iterable] = 1, dilation: ~typing.Union[int, ~typing.Tuple[int, int, int], ~typing.Iterable] = 1, padding_mode: ~InnerEye.ML.config.PaddingMode = PaddingMode.Zero, activation: ~typing.Callable = <class 'torch.nn.modules.activation.ReLU'>, use_residual: bool = True, depth: ~typing.Optional[int] = None)[source]

Implements a EncodeBlock for UNet. A EncodeBlock is two BasicLayers without dilation and with same padding. The first of those BasicLayer can use stride > 1, and hence will downsample.

Parameters:

channels – A list containing two elements representing the number of input and output channels
kernel_size – Spatial support of convolution kernels. If an integer is provided, the same value will be repeated for all three dimensions. For non-cubic kernels please pass a tuple with three elements.
downsampling_stride – Downsampling factor used in the first convolutional layer. If an integer is passed, the same downsampling factor will be used for all three dimensions.
dilation – Dilation of convolution kernels - If set to > 1, kernels capture content from wider range.
activation – Linear/Non-linear activation function that is used after linear convolution mappings.
use_residual – If set to True, block2 learns the residuals while preserving the output of block1
depth – The depth inside the UNet at which the layer operates. This is only for diagnostic purposes.

forward(x: Any) → Any[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

training: bool

class UNetEncodeBlockSynthesis(channels: ~typing.Tuple[int, int], kernel_size: ~typing.Union[int, ~typing.Tuple[int, int, int], ~typing.Iterable], dilation: ~typing.Union[int, ~typing.Tuple[int, int, int], ~typing.Iterable] = 1, padding_mode: ~InnerEye.ML.config.PaddingMode = PaddingMode.Zero, activation: ~typing.Callable = <class 'torch.nn.modules.activation.ReLU'>, depth: ~typing.Optional[int] = None)[source]

Encode block used in upsampling path of UNet Model. It differs from UNetEncodeBlock by being able to aggregate information coming from both skip connection and upsampled tensors. Instead of using standard concatenation op followed by a convolution op, this encoder block decomposes the chain of these ops into multiple convolutions, this way memory usage is reduced.

forward(x: Any, skip_connection: Any) → Any[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

training: bool

forward(x: Tensor) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

get_all_child_layers() → List[Module][source]

partition_model(devices: Optional[List[device]] = None) → None[source]: A method to partition a neural network model across multiple devices. If no list of devices is given, use all available GPU devices.

summarizer: Optional[ModelSummary]

summary: Optional[OrderedDict]

summary_crop_size: Optional[TupleInt3]

training: bool

class InnerEye.ML.models.architectures.unet_2d.UNet2D(input_image_channels: int, initial_feature_channels: int, num_classes: int, num_downsampling_paths: int = 4, downsampling_dilation: int = 2, padding_mode: PaddingMode = PaddingMode.Zero)[source]

This class implements a UNet in 2 dimensions, with the input expected as a 3 dimensional tensor with a vanishing Z dimension.

get_all_child_layers() → List[Module][source]

summarizer: Optional[ModelSummary]

summary: Optional[OrderedDict]

summary_crop_size: Optional[TupleInt3]

training: bool

Classificiation

class InnerEye.ML.models.architectures.classification.bit.BiTResNetV2(num_groups: int = 32, num_classes: int = 21843, num_blocks_in_layer: Tuple[int, int, int, int] = (3, 4, 23, 3), width_factor: int = 1)[source]

Implements the Big Transfer (BiT) model

https://arxiv.org/pdf/1912.11370.pdf https://github.com/google-research/big_transfer

forward(x: Tensor) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

training: bool

class InnerEye.ML.models.architectures.classification.bit.ResNetV2Block(in_channels: int, out_channels: int, bottleneck_channels: int, num_groups: int, downsample_stride: int = 1)[source]

ResNetV2 (https://arxiv.org/pdf/1603.05027.pdf) uses pre activation in the ResNet blocks. Big Transfer replaces BatchNorm with GroupNorm

forward(x: Tensor) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

training: bool

class InnerEye.ML.models.architectures.classification.bit.ResNetV2Layer(in_channels: int, out_channels: int, bottleneck_channels: int, num_groups: int, downsample_stride: int, num_blocks: int)[source]

Single layer of ResNetV2

forward(x: Tensor) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

training: bool

class InnerEye.ML.models.architectures.classification.image_encoder_with_mlp.ImageAndNonImageFeaturesAggregator[source]

Aggregator module to combine imaging and non imaging features by concatenating.

forward(*item: Tensor, **kwargs: Any) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

training: bool

class InnerEye.ML.models.architectures.classification.image_encoder_with_mlp.ImageEncoder(imaging_feature_type: ImagingFeatureType = ImagingFeatureType.Image, encode_channels_jointly: bool = False, num_image_channels: int = 1, num_encoder_blocks: int = 5, initial_feature_channels: int = 32, num_non_image_features: int = 0, padding_mode: PaddingMode = PaddingMode.NoPadding, kernel_size_per_encoding_block: Union[Tuple[int, int, int], List[Tuple[int, int, int]]] = (1, 3, 3), stride_size_per_encoding_block: Union[Tuple[int, int, int], List[Tuple[int, int, int]]] = (1, 2, 2), encoder_dimensionality_reduction_factor: float = 0.8, aggregation_type: AggregationType = AggregationType.Average, scan_size: Optional[Tuple[int, int, int]] = None)[source]

An architecture for an image encoder that encodes the image with several UNet encoder blocks, and optionally appends non-imaging features to the encoder image features. This module hence creates the features to be used as an input for a classification or a regression module.

create_encoder(channels: List[int]) → ModuleList[source]: Create an image encoder network.

create_non_image_and_image_aggregator() → ImageAndNonImageFeaturesAggregator[source]

encode_and_aggregate(x: Tensor) → Tensor[source]

forward(*item: Tensor, **kwargs: Any) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

get_input_tensors(item: ScalarItem) → List[Tensor][source]

Transforms a classification item into a torch.Tensor that the forward pass can consume

Parameters:: item – ClassificationItem
Returns:: Tensor

get_last_encoder_layer_names() → List[str][source]: Return the name of the last encoder layers for GradCam. Default is an empty list.

training: bool

class InnerEye.ML.models.architectures.classification.image_encoder_with_mlp.ImageEncoderWithMlp(mlp_dropout: float = 0.5, final_activation: Module = Identity(), imaging_feature_type: ImagingFeatureType = ImagingFeatureType.Image, encode_channels_jointly: bool = False, num_image_channels: int = 1, num_encoder_blocks: int = 5, initial_feature_channels: int = 32, num_non_image_features: int = 0, padding_mode: PaddingMode = PaddingMode.NoPadding, kernel_size_per_encoding_block: Union[Tuple[int, int, int], List[Tuple[int, int, int]]] = (1, 3, 3), stride_size_per_encoding_block: Union[Tuple[int, int, int], List[Tuple[int, int, int]]] = (1, 2, 2), encoder_dimensionality_reduction_factor: float = 0.8, aggregation_type: AggregationType = AggregationType.Average, scan_size: Optional[Tuple[int, int, int]] = None)[source]

An architecture for an image classifier that first encodes the image with several UNet encoder blocks, and then feeds the resulting features through a multi layer perceptron (MLP). The architecture can handle multiple input channels. Each input channels is fed either through a separate UNet encoder pathway (if the argument encode_channels_jointly is False) or together with all other channels (if encode_channels_jointly is False) The latter makes the implicit assumption that the channels are spatially aligned.

forward(*item: Tensor, **kwargs: Any) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

last_encoder_layer: List[str]

training: bool

class InnerEye.ML.models.architectures.classification.image_encoder_with_mlp.ImagingFeatureType(value)[source]

An enumeration.

Image = 'Image'

ImageAndSegmentation = 'ImageAndSegmentation'

Segmentation = 'Segmentation'

InnerEye.ML.models.architectures.classification.image_encoder_with_mlp.create_mlp(input_num_feature_channels: int, dropout: float, final_output_channels: int = 1, final_layer: Optional[Module] = None, hidden_layer_num_feature_channels: Optional[int] = None) → MLP[source]

Create an MLP with 1 hidden layer.

Parameters:

input_num_feature_channels – The number of input channels to the first MLP layer.
dropout – The drop out factor that should be applied between the first and second MLP layer.
final_output_channels – if provided, the final number of output channels.
final_layer – if provided, the final (activation) layer to apply
hidden_layer_num_feature_channels – if provided, will be used to create hidden layers, If None then input_num_feature_channels // 2 will be used to create the hidden layer.

Returns:

InnerEye.ML.models.architectures.classification.image_encoder_with_mlp.encode_and_aggregate(input_tensor: Tensor, encoder: Module, num_encoder_input_channels: int, num_image_channels: int, encode_channels_jointly: bool, aggregation_layer: Callable) → Tensor[source]: Function that encodes a given input tensor either jointly using the encoder or separately for each channel in a sequential manner. Features obtained at the output encoder are then aggregated with the pooling function defined by aggregation layer.

class InnerEye.ML.models.architectures.classification.segmentation_encoder.MultiSegmentationEncoder(num_image_channels: int, encode_channels_jointly: bool = False, use_mixed_precision: bool = True)[source]

encode_and_aggregate(input_tensor: Tensor) → Tensor[source]

forward(x: Tensor) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

get_input_tensors(item: ScalarItem) → List[Tensor][source]

Transforms a classification item into a torch.Tensor that the forward pass can consume

Parameters:: item – ClassificationItem
Returns:: Tensor

training: bool

class InnerEye.ML.models.architectures.classification.segmentation_encoder.SegmentationEncoder(in_channels: int)[source]

Implements the eye pathology classification model outlined in the following paper: De Fauw, Jeffrey, et al. “Clinically applicable deep learning for diagnosis and referral in retinal disease.” Nature medicine 24.9 (2018): 1342-1350. The model takes segmentation maps as input and outputs its most likely corresponding semantic class.

forward(x: Tensor) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

property out_channels: int: Gets the number of channels that this model will output.

training: bool

Others

class InnerEye.ML.models.architectures.complex.ComplexModel(args: SegmentationModelBase, full_channels_list: List[int], dilations: List[int], network_definition: List[List[Module]], crop_size_constraints: Optional[CropSizeConstraints] = None)[source]

A general class of feed-forward convolutional neural networks that is characterised by a network definition (list of lists of modules). It supports residual blocks, auto-focus and atrous spatial pyramid pooling layers.

forward(x: Any) → Any[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

get_all_child_layers() → List[Module][source]

summarizer: Optional[ModelSummary]

summary: Optional[OrderedDict]

summary_crop_size: Optional[TupleInt3]

training: bool

class InnerEye.ML.models.architectures.mlp.MLP(hidden_layers: List[HiddenLayer])[source]

An implementation of a Multilayer Perceptron, with Tanh activations, and the ability to configure dropout and batch norm layers.

class HiddenLayer(channels: Tuple[int, int], dropout: float = 0.0, use_layer_normalisation: bool = True, activation: Module = Identity())[source]

An implementation of a single Multilayer Perceptron hidden layer with Tanh activations, and the ability to configure dropout and batch norm layers.

forward(x: Tensor) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

training: bool

forward(x: Tensor) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

get_input_tensors(item: ScalarItem) → List[Tensor][source]

Transforms a classification item into a torch.Tensor that the forward pass can consume

Parameters:: item – ClassificationItem
Returns:: Tensor

training: bool