Dataset

Full Datasets

class InnerEye.ML.dataset.full_image_dataset.FullImageDataset(args: SegmentationModelBase, data_frame: DataFrame, full_image_sample_transforms: Optional[Compose3D[Sample]] = None)[source]

Dataset class that loads and creates samples with full 3D images from a given pd.Dataframe. The following are the operations performed to generate a sample from this dataset:

On initialization parses the provided pd.Dataframe with dataset information, to cache the set of file paths and patient mappings to load as PatientDatasetSource. The sources are then saved in a list: dataset_sources.
dataset_sources is iterated in a batched fashion, where for each batch it loads the full 3D images, and applies pre-processing functions (e.g. normalization), returning a sample that can be used for full image operations.

get_samples_at_index(index: int) → List[Sample][source]

class InnerEye.ML.dataset.full_image_dataset.GeneralDataset(args: D, data_frame: Optional[DataFrame] = None, name: Optional[str] = None)[source]

as_data_loader(shuffle: bool, batch_size: Optional[int] = None, num_dataload_workers: Optional[int] = None, use_imbalanced_sampler: bool = False, drop_last_batch: bool = False, max_repeats: Optional[int] = None) → DataLoader[source]

class InnerEye.ML.dataset.full_image_dataset.ImbalancedSampler(dataset: Any, num_samples: Optional[int] = None)[source]

Sampler that performs naive over-sampling by drawing samples with replacements. The probability of being drawn depends on the label of each data point, rare labels have a higher probability to be drawn. Assumes the dataset implements the “get_all_labels” functions in order to compute the weights associated with each data point.

Side note: the sampler choice is independent from the data augmentation pipeline. Data augmentation is performed on the images while loading them at a later stage. This sampler merely affects which item is selected.

get_weights() → Tensor[source]

class InnerEye.ML.dataset.full_image_dataset.RepeatDataLoader(dataset: Any, max_repeats: int, batch_size: int = 1, shuffle: bool = False, use_imbalanced_sampler: bool = False, drop_last: bool = False, **kwargs: Any)[source]

This class implements a data loader that avoids spawning a new process after each epoch. It uses an infinite sampler. This is adapted from https://github.com/pytorch/pytorch/issues/15849

batch_size: Optional[int]

dataset: Dataset[T_co]

drop_last: bool

num_workers: int

pin_memory: bool

prefetch_factor: int

sampler: Sampler

timeout: float

InnerEye.ML.dataset.full_image_dataset.collate_with_metadata(batch: List[Dict[str, Any]]) → Dict[str, Any][source]

The collate function that the dataloader workers should use. It does the same thing for all “normal” fields (all fields are put into tensors with outer dimension batch_size), except for the special “metadata” field. Those metadata objects are collated into a simple list.

Parameters:: batch – A list of samples that should be collated.
Returns:: collated result

InnerEye.ML.dataset.full_image_dataset.convert_channels_to_file_paths(channels: List[str], rows: DataFrame, local_dataset_root_folder: Path, patient_id: str, allow_incomplete_labels: bool = False) → Tuple[List[Optional[Path]], str][source]

Returns: 1) A list of path file objects specified in the training, validation and testing datasets, and

a string with description of missing channels, files and more than one channel per patient.

Parameters:

channels – channel type defined in the configuration file
rows – Input Pandas dataframe object containing subjectIds, path of local dataset, channel information
local_dataset_root_folder – Root directory which points to the local dataset
patient_id – string which contains subject identifier
allow_incomplete_labels – boolean flag. If false, all ground truth files must be provided. If true, ground truth files are optional

InnerEye.ML.dataset.full_image_dataset.load_dataset_sources(dataframe: DataFrame, local_dataset_root_folder: Path, image_channels: List[str], ground_truth_channels: List[str], mask_channel: Optional[str], allow_incomplete_labels: bool = False) → Dict[str, PatientDatasetSource][source]

Prepares a patient-to-images mapping from a dataframe read directly from a dataset CSV file. The dataframe contains per-patient per-channel image information, relative to a root directory. This method converts that into a per-patient dictionary, that contains absolute file paths separated for for image channels, ground truth channels, and mask channels.

Parameters:

dataframe – A dataframe read directly from a dataset CSV file.
local_dataset_root_folder – The root folder that contains all images.
image_channels – The names of the image channels that should be used in the result.
ground_truth_channels – The names of the ground truth channels that should be used in the result.
mask_channel – The name of the mask channel that should be used in the result. This can be None.
allow_incomplete_labels – Boolean flag. If false, all ground truth files must be provided. If true, ground truth files are optional. Default value is false.

Returns:

A dictionary mapping from an integer subject ID to a PatientDatasetSource.

class InnerEye.ML.dataset.sample.CroppedSample(image: Union[ndarray, Tensor], mask: Union[ndarray, Tensor], labels: Union[ndarray, Tensor], metadata: PatientMetadata, mask_center_crop: Union[Tensor, ndarray], labels_center_crop: Union[Tensor, ndarray], center_indices: Union[Tensor, ndarray])[source]

Instance of a dataset sample (compatible with PyTorch data loader) used for training that contains (possibly) cropped images as well as the center crops for the mask and the labels.

center_indices: Union[Tensor, ndarray]

labels_center_crop: Union[Tensor, ndarray]

mask_center_crop: Union[Tensor, ndarray]

class InnerEye.ML.dataset.sample.GeneralSampleMetadata(id: str, props: ~typing.Dict[str, ~typing.Any] = <factory>, sequence_position: int = 0)[source]

A very generic class to store information about a sample inside of a dataset. Each sample has a string identifier, and a dictionary for attributes.

id: str

props: Dict[str, Any]

sequence_position: int = 0

class InnerEye.ML.dataset.sample.PatientDatasetSource(image_channels: List[Union[Path, str]], ground_truth_channels: List[Optional[Union[Path, str]]], mask_channel: Optional[Union[Path, str]], metadata: PatientMetadata, allow_incomplete_labels: Optional[bool] = False)[source]

Dataset source locations for channels associated with a given patient in a particular dataset.

allow_incomplete_labels: Optional[bool] = False

ground_truth_channels: List[Optional[Union[Path, str]]]

image_channels: List[Union[Path, str]]

mask_channel: Optional[Union[Path, str]]

metadata: PatientMetadata

class InnerEye.ML.dataset.sample.PatientMetadata(patient_id: str, image_header: Optional[ImageHeader] = None, institution: Optional[str] = None, series: Optional[str] = None, tags_str: Optional[str] = None)[source]

Patient metadata

static from_dataframe(dataframe: DataFrame, patient_id: str) → PatientMetadata[source]

Extracts the patient metadata columns from a dataframe that represents a full dataset. For each of the columns “seriesId”, “instituionId” and “tags”, the distinct values for the given patient are computed. If there is exactly 1 distinct value, that is returned as the respective patient metadata. If there is more than 1 distinct value, the metadata column is set to None.

Parameters:

dataframe – The dataset to read from.
patient_id – The ID of the patient for which the metadata should be extracted.

Returns:

An instance of PatientMetadata for the given patient_id

image_header: Optional[ImageHeader] = None

institution: Optional[str] = None

patient_id: str

series: Optional[str] = None

tags_str: Optional[str] = None

class InnerEye.ML.dataset.sample.Sample(image: Union[ndarray, Tensor], mask: Union[ndarray, Tensor], labels: Union[ndarray, Tensor], metadata: PatientMetadata)[source]

Instance of a dataset sample that contains full 3D images, and is compatible with PyTorch data loader.

image: Union[ndarray, Tensor]

property image_spacing: Tuple[float, float, float]

labels: Union[ndarray, Tensor]

mask: Union[ndarray, Tensor]

metadata: PatientMetadata

property patient_id: int

class InnerEye.ML.dataset.sample.SampleBase[source]

All flavours of dataset samples should inherit from this class.

clone_with_overrides(**overrides: Any) → T[source]

Create a clone of the current sample, with the provided overrides to replace the existing properties if they exist.

Parameters:: overrides –
Returns:

classmethod from_dict(sample: Dict[str, Any]) → T[source]

Create an instance of the sample class, based on the provided sample dictionary

Parameters:: sample – dictionary of arguments
Returns:: an instance of the SampleBase class

get_dict() → Dict[str, Any][source]: Get the current sample as a dictionary of property names and their values.

Scalar Datasets

class InnerEye.ML.dataset.scalar_dataset.DataSourceReader(data_frame: ~pandas.core.frame.DataFrame, label_value_column: str, image_file_column: ~typing.Optional[str] = None, image_channels: ~typing.Optional[~typing.List[str]] = None, label_channels: ~typing.Optional[~typing.List[str]] = None, transform_labels: ~typing.Union[~typing.Callable, ~typing.List[~typing.Callable]] = <function LabelTransformation.identity>, non_image_feature_channels: ~typing.Optional[~typing.Dict[str, ~typing.List[str]]] = None, numerical_columns: ~typing.Optional[~typing.List[str]] = None, sequence_column: ~typing.Optional[str] = None, subject_column: str = 'subject', channel_column: str = 'channel', is_classification_dataset: bool = True, num_classes: int = 1, categorical_data_encoder: ~typing.Optional[~InnerEye.ML.utils.dataset_util.CategoricalToOneHotEncoder] = None)[source]

Class that allows reading of data sources from a scalar dataset data frame.

load_data_sources(num_dataset_reader_workers: int = 0) → List[ScalarDataSource][source]

Extracts information from a dataframe to create a list of ClassificationItem. This will create one entry per unique value of subject_id in the dataframe. The file is structured around “channels”, indicated by specific values in the CSV_CHANNEL_HEADER column. The result contains paths to image files, a label vector, and a matrix of additional values that are specified by rows and columns given in non_image_feature_channels and numerical_columns.

Parameters:: num_dataset_reader_workers – Number of worker processes to use, if 0 then single threaded execution, otherwise if -1 then multiprocessing with all available cpus will be used.
Returns:: A list of ScalarDataSource or SequenceDataSource instances

static load_data_sources_as_per_config(data_frame: DataFrame, args: ScalarModelBase) → List[ScalarDataSource][source]

Loads dataset items from the given dataframe, where all column and channel configurations are taken from their respective model config elements.

Parameters:

data_frame – The dataframe to read dataset items from.
args – The model configuration object.

Returns:

A list of all dataset items that could be read from the dataframe.

load_datasources_for_subject(subject_id: str) → Optional[List[ScalarDataSource]][source]

class InnerEye.ML.dataset.scalar_dataset.ScalarDataset(args: ~InnerEye.ML.scalar_config.ScalarModelBase, data_frame: ~typing.Optional[~pandas.core.frame.DataFrame] = None, feature_statistics: ~typing.Optional[~InnerEye.ML.utils.features_util.FeatureStatistics] = None, name: ~typing.Optional[str] = None, sample_transform: ~typing.Callable[[~InnerEye.ML.dataset.scalar_sample.ScalarItem], ~InnerEye.ML.dataset.scalar_sample.ScalarItem] = <InnerEye.ML.dataset.scalar_dataset.ScalarItemAugmentation object>)[source]

A dataset class that can read CSV files with a flexible schema, and extract image file paths and non-image features.

filter_valid_data_sources_items(data_sources: List[ScalarDataSource]) → List[ScalarDataSource][source]

get_class_counts() → Dict[int, int][source]

Return the label counts as a dictionary with the key-value pairs being the class indices and per-class counts. In the binary case, the dictionary will have a single element. The key will be 0 as there is only one class and one class index. The value stored will be the number of samples that belong to the positive class. In the multilabel case, this returns a dictionary with class indices and samples per class as the key-value pairs.

Returns:: Dictionary of {class_index: count}

get_labels_for_imbalanced_sampler() → List[float][source]: Returns a list of all the labels in the dataset. Used to compute the sampling weights in Imbalanced Sampler

get_status() → str[source]: Creates a human readable string that describes the contents of the dataset.

items: List[ScalarDataSource]

class InnerEye.ML.dataset.scalar_dataset.ScalarDatasetBase(args: ~InnerEye.ML.scalar_config.ScalarModelBase, data_frame: ~typing.Optional[~pandas.core.frame.DataFrame] = None, feature_statistics: ~typing.Optional[~InnerEye.ML.utils.features_util.FeatureStatistics] = None, name: ~typing.Optional[str] = None, sample_transform: ~typing.Callable[[~InnerEye.ML.dataset.scalar_sample.ScalarItem], ~InnerEye.ML.dataset.scalar_sample.ScalarItem] = <InnerEye.ML.dataset.scalar_dataset.ScalarItemAugmentation object>)[source]

A base class for datasets for classification tasks. It contains logic for loading images from disk, either from a fixed folder or traversing into subfolders.

create_status_string(items: List[ScalarDataSource]) → str[source]

Creates a human readable string that contains the number of items, and the distinct number of subjects.

Parameters:: items – Use the items provided to create the string
Returns:: A string like “12 items for 5 subjects”

filter_valid_data_sources_items(data_sources: List[ScalarDataSource]) → List[ScalarDataSource][source]

abstract get_labels_for_imbalanced_sampler() → List[float][source]

items: List[ScalarDataSource]

load_all_data_sources() → List[ScalarDataSource][source]

Uses the dataframe to create data sources to be used by the dataset.

Returns:: List of data sources.

load_item(item: ScalarDataSource) → ScalarItem[source]

Loads the images and/or segmentations as given in the ClassificationDataSource item and applying the optional transformation specified by the class.

Parameters:: item – The item to load.
Returns:: A ClassificationItem instances with the loaded images, and the labels and non-image features copied from the argument.

one_hot_encoder: Optional[CategoricalToOneHotEncoder] = None

standardize_non_imaging_features() → None[source]: Modifies the non image features that this data loader stores, such that they have mean 0, variance 1. Mean and variances are either taken from the argument feature_mean_and_variance (use that when the data set contains validation or test sequences), or computed from the dataset itself (use for the training set). If None, they will be computed from the data in the present object.

status: str = ''

class InnerEye.ML.dataset.scalar_dataset.ScalarItemAugmentation(image_transform: Optional[Callable] = None, segmentation_transform: Optional[Callable] = None)[source]: Wrapper around augmentation pipeline to apply image or/and segmentation transformations to a ScalarItem inputs.

InnerEye.ML.dataset.scalar_dataset.extract_label_classification(label_string: str, sample_id: str, num_classes: int, is_classification_dataset: bool) → List[float][source]

Converts a string from a dataset.csv file that contains a model’s label to a scalar.

For classification datasets: If num_classes is 1 (binary classification tasks)

The function maps [“1”, “true”, “yes”] to [1], [“0”, “false”, “no”] to [0]. If the entry in the CSV file was missing (no string given at all) or an empty string, it returns math.nan.

If num_classes is greater than 1 (multilabel datasets):: The function maps a pipe-separated set of classes to a tensor with ones at the indices of the positive classes and 0 elsewhere (for example if we have a task with 6 label classes, map “1|3|4” to [0, 1, 0, 1, 1, 0]). If the entry in the CSV file was missing (no string given at all) or an empty string, this function returns an all-zero tensor (none of the label classes were positive for this sample).
For regression datasets:: The function casts a string label to float. Raises an exception if the conversion is not possible. If the entry in the CSV file was missing (no string given at all) or an empty string, it returns math.nan.

Parameters:

label_string – The value of the label as read from CSV via a DataFrame.
sample_id – The sample ID where this label was read from. This is only used for creating error messages.
num_classes – Number of classes. This should be equal the size of the model output. For binary classification tasks, num_classes should be one. For multilabel classification tasks, num_classes should correspond to the number of label classes in the problem.
is_classification_dataset – If the model is a classification model

Returns:

A list of floats with the same size as num_classes

InnerEye.ML.dataset.scalar_dataset.files_by_stem(root_path: Path) → Dict[str, Path][source]

Lists all files under the given root directory recursively, and returns a mapping from file name stem to full path. The file name stem is computed more restrictively than what Path.stem returns: file.nii.gz will use “file” as the stem, not “file.nii” as Path.stem would. Only actual files are returned in the mapping, no directories. If there are multiple files that map to the same stem, the function raises a ValueError.

Parameters:: root_path – The root directory from which the file search should start.
Returns:: A dictionary mapping from file name stem to the full path to where the file is found.

InnerEye.ML.dataset.scalar_dataset.filter_valid_classification_data_sources_items(items: Iterable[ScalarDataSource], file_to_path_mapping: Optional[Dict[str, Path]], max_sequence_position_value: Optional[int] = None, min_sequence_position_value: int = 0) → List[ScalarDataSource][source]

Consumes a list of classification data sources, and removes all of those that have missing file names, or that have NaN or Inf features. If the file_to_path_mapping is given too, all items that have any missing files (files not present on disk) are dropped too. Items that have sequence position larger than the max_sequence_position_value are removed.

Parameters:

items – The list of items to filter.
min_sequence_position_value – Restrict the data to items with a metadata.sequence_position that is at least the value given here. Default is 0.
max_sequence_position_value – If provided then this is the maximum sequence position the sequence can end with. Longer sequences will be truncated. None is default.
file_to_path_mapping – A mapping from a file name stem (without extension) to its full path.

Returns:

A list of items, all of which are valid now.

InnerEye.ML.dataset.scalar_dataset.is_valid_item_index(item: ScalarDataSource, max_sequence_position_value: Optional[int], min_sequence_position_value: int = 0) → bool[source]

Returns True if the item metadata in metadata.sequence_position is a valid sequence index.

Parameters:

item – The item to check.
min_sequence_position_value – Check if the item has a metadata.sequence_position that is at least the value given here. Default is 0.
max_sequence_position_value – If provided then this is the maximum sequence position the sequence can end with. Longer sequences will be truncated. None is default.

Returns:

True if the item has a valid index.

InnerEye.ML.dataset.scalar_dataset.load_single_data_source(subject_rows: ~pandas.core.frame.DataFrame, subject_id: str, label_value_column: str, channel_column: str, image_channels: ~typing.Optional[~typing.List[str]] = None, image_file_column: ~typing.Optional[str] = None, label_channels: ~typing.Optional[~typing.List[str]] = None, transform_labels: ~typing.Union[~typing.Callable, ~typing.List[~typing.Callable]] = <function LabelTransformation.identity>, non_image_feature_channels: ~typing.Optional[~typing.Dict] = None, numerical_columns: ~typing.Optional[~typing.List[str]] = None, categorical_data_encoder: ~typing.Optional[~InnerEye.ML.utils.dataset_util.CategoricalToOneHotEncoder] = None, metadata_columns: ~typing.Optional[~typing.Set[str]] = None, is_classification_dataset: bool = True, num_classes: int = 1, sequence_position_numeric: ~typing.Optional[int] = None) → ScalarDataSource[source]

Converts a set of dataset rows for a single subject to a ScalarDataSource instance, which contains the labels, the non-image features, and the paths to the image files.

Parameters:

num_classes – Number of classes, this is equivalent to model output tensor size
channel_column – The name of the column that contains the row identifier (“channels”)
metadata_columns – A list of columns that well be added to the item metadata as key/value pairs.
subject_rows – All dataset rows that belong to the same subject.
subject_id – The identifier of the subject that is being processed.
image_channels – The names of all channels (stored in the CSV_CHANNEL_HEADER column of the dataframe) that are expected to be loaded from disk later because they are large images.
image_file_column – The name of the column that contains the image file names.
label_channels – The name of the channel where the label scalar or vector is read from.
label_value_column – The column that contains the value for the label scalar or vector.
non_image_feature_channels – non_image_feature_channels: A dictonary of the names of all channels where additional scalar values should be read from. THe keys should map each feature to its channels.
numerical_columns – The names of all columns where additional scalar values should be read from.
categorical_data_encoder – Encoding scheme for categorical data.
is_classification_dataset – If True, the dataset will be used in a classification model. If False, assume that the dataset will be used in a regression model.
transform_labels – a label transformation or a list of label transformation to apply to the labels. If a list is provided, the transformations are applied in order from left to right.
sequence_position_numeric – Numeric position of the data source in a data sequence. Assumed to be a non-sequential dataset item if None provided (default).

Returns:

A ScalarDataSource containing the specified data.

class InnerEye.ML.dataset.scalar_sample.ScalarDataSource(metadata: 'GeneralSampleMetadata', label: 'torch.Tensor', numerical_non_image_features: 'torch.Tensor', categorical_non_image_features: 'torch.Tensor', channel_files: 'List[Optional[str]]')[source]

channel_files: List[Union[None, str]]

files_valid() → bool[source]

get_all_image_filepaths(root_path: Optional[Path], file_mapping: Optional[Dict[str, Path]]) → List[Path][source]

Get a list of image paths for the object. Either root_path or file_mapping must be specified.

Parameters:

root_path – The root path where all channel files for images are expected. This is ignored if file_mapping is given.
file_mapping – A mapping from a file name stem (without extension) to its full path.

static get_full_image_filepath(file: str, root_path: Optional[Path], file_mapping: Optional[Dict[str, Path]]) → Path[source]

Get the full path of an image file given the path relative to the dataset folder and one of root_path or file_mapping.

Parameters:

file – Image filepath relative to the dataset folder
root_path – The root path where all channel files for images are expected. This is ignored if file_mapping is given.
file_mapping – A mapping from a file name stem (without extension) to its full path.

is_valid() → bool[source]

Checks if all file paths and non-image features are present in the object. All image channel files must be not None, and none of the non imaging features may be NaN or infinity.

Returns:: True if channel files is a list with not-None entries, and all non imaging features are finite floating point numbers.

load_images(root_path: Optional[Path], file_mapping: Optional[Dict[str, Path]], load_segmentation: bool, center_crop_size: Optional[Tuple[int, int, int]], image_size: Optional[Tuple[int, int, int]]) → ScalarItem[source]

Loads all the images that are specified in the channel_files field, and stacks them into a tensor along the first dimension. The channel_files field must either contain the image file path, relative to the root_path argument, or it must contain a file name stem only (without extension). In this case, the actual mapping from file name stem to full path is expected in the file_mapping argument. Either of ‘root_path’ or ‘file_mapping’ must be provided.

Parameters:

root_path – The root path where all channel files for images are expected. This is ignored if file_mapping is given.
file_mapping – A mapping from a file name stem (without extension) to its full path.
load_segmentation – If True it loads segmentation if present on the same file as the image.
center_crop_size – If supplied, all loaded images will be cropped to the size given here. The crop will be taken from the center of the image.
image_size – If given, all loaded images will be reshaped to the size given here, prior to the center crop.

Returns:

An instance of ClassificationItem, with the same label and numerical_non_image_features fields, and all images loaded.

class InnerEye.ML.dataset.scalar_sample.ScalarItem(metadata: GeneralSampleMetadata, label: Tensor, numerical_non_image_features: Tensor, categorical_non_image_features: Tensor, images: Tensor, segmentations: Optional[Tensor])[source]

This class contains all information that are input to an image classification model, including the images itself. Labels and numerical_non_image_features can be matrices of arbitrary size.

get_all_non_imaging_features() → Tensor[source]: Returns a concatenation of the numerical_non_image_features and categorical_non_image_features

images: Tensor

segmentations: Optional[Tensor]

to_device(device: Any) → ScalarItem[source]

Creates a copy of the present object where all tensors live on the given CUDA device. The metadata field is left unchanged.

Parameters:: device – The CUDA or GPU device to move to.
Returns:: A new ScalarItem with all tensors on the chosen device.

class InnerEye.ML.dataset.scalar_sample.ScalarItemBase(metadata: GeneralSampleMetadata, label: Tensor, numerical_non_image_features: Tensor, categorical_non_image_features: Tensor)[source]

This class contains all information that are input to an image classification model, apart from the image itself. Labels and numerical_non_image_features can be matrices of arbitrary size.

categorical_non_image_features: Tensor

features_valid() → bool[source]: Return True if numerical_non_image_features and categorical_non_image_features are valid ie: none of the elements in the tensors are Not a Number.

property id: str: Gets the identifier of the present object from metadata.

is_valid() → bool[source]: Return True if numerical_non_image_features, categorical_non_image_features and label are valid ie: none of the elements in the tensors are either Not a Number or Infinity.

label: Tensor

labels_valid() → bool[source]: Checks to make sure label tensor is valid ie: none of the elements in the tensors are either Not a Number or Infinity.

metadata: GeneralSampleMetadata

numerical_non_image_features: Tensor

property props: Dict[str, Any]: Gets the general metadata dictionary for the present object.

class InnerEye.ML.dataset.scalar_sample.SequenceDataSource(metadata: 'GeneralSampleMetadata', label: 'torch.Tensor', numerical_non_image_features: 'torch.Tensor', categorical_non_image_features: 'torch.Tensor', channel_files: 'List[Optional[str]]')[source]

categorical_non_image_features: Tensor

channel_files: List[Union[None, str]]

label: Tensor

labels_valid() → bool[source]: Checks to make sure label tensor is valid ie: none of the elements in the tensors are either Not a Number or Infinity.

metadata: GeneralSampleMetadata

numerical_non_image_features: Tensor

CroppingDataset

class InnerEye.ML.dataset.cropping_dataset.CroppingDataset(args: SegmentationModelBase, data_frame: DataFrame, cropped_sample_transforms: Optional[Compose3D[CroppedSample]] = None, full_image_sample_transforms: Optional[Compose3D[Sample]] = None)[source]

Dataset class that creates random cropped samples from full 3D images from a given pd.DataFrame. The following are the operations performed to generate a sample from this dataset. The crops extracted are of size crop_size which is defined in the model config, and the crop center class population is distributed as per the class_weights vector in the model config (which by default weights all classes equally)

__getitem__(i: int) → Dict[str, Any][source]

static create_possibly_padded_sample_for_cropping(sample: Sample, crop_size: Tuple[int, int, int], padding_mode: PaddingMode) → Sample[source]

Pad the original sample such the the provided images has the same (or slightly larger in case of uneven difference) shape to the output_size, using the provided padding mode.

Parameters:

sample – Sample to pad.
crop_size – Crop size to match.
padding_mode – The padding scheme to apply.

Returns:

padded sample

static create_random_cropped_sample(sample: Sample, crop_size: Tuple[int, int, int], center_size: Tuple[int, int, int], class_weights: Optional[List[float]] = None) → CroppedSample[source]

Creates an instance of a cropped sample extracted from full 3D images.

Parameters:

sample – the full size 3D sample to use for extracting a cropped sample.
crop_size – the size of the crop to extract.
center_size – the size of the center of the crop (this should be the same as the spatial dimensions of the posteriors that the model produces)
class_weights – the distribution to use for the crop center class.

Returns:

CroppedSample