Azure

Below you will find the documentation for InnerEye/Azure.

Runner

InnerEye.Azure.azure_runner.additional_run_tags(azure_config: AzureConfig, commandline_args: str) → Dict[str, str][source]

Gets the set of tags that will be added to the AzureML run as metadata, like git status and user name.

Parameters:

azure_config – The configurations for the present AzureML job
commandline_args – A string that holds all commandline arguments that were used for the present run.

InnerEye.Azure.azure_runner.create_dataset_configs(azure_config: AzureConfig, all_azure_dataset_ids: List[str], all_dataset_mountpoints: List[str], all_local_datasets: List[Optional[Path]]) → List[DatasetConfig][source]

Sets up all the dataset consumption objects for the datasets provided. The returned list will have the same length as there are non-empty azure dataset IDs.

Valid arguments combinations: N azure datasets, 0 or N mount points, 0 or N local datasets

Parameters:

azure_config – azure related configurations to use for model scale-out behaviour
all_azure_dataset_ids – The name of all datasets on blob storage that will be used for this run.
all_dataset_mountpoints – When using the datasets in AzureML, these are the per-dataset mount points.
all_local_datasets – The paths for all local versions of the datasets.

Returns:

A list of DatasetConfig objects, in the same order as datasets were provided in all_azure_dataset_ids, omitting datasets with an empty name.

InnerEye.Azure.azure_runner.create_experiment_name(azure_config: AzureConfig) → str[source]

Gets the name of the AzureML experiment. This is taken from the commandline, or from the git branch.

Parameters:: azure_config – The object containing all Azure-related settings.
Returns:: The name to use for the AzureML experiment.

InnerEye.Azure.azure_runner.create_runner_parser(model_config_class: Optional[type] = None) → ArgumentParser[source]

Creates a commandline parser, that understands all necessary arguments for running a script in Azure, plus all arguments for the given class. The class must be a subclass of GenericConfig.

Parameters:: model_config_class – A class that contains the model-specific parameters.
Returns:: An instance of ArgumentParser.

InnerEye.Azure.azure_runner.get_git_tags(azure_config: AzureConfig) → Dict[str, str][source]

Creates a dictionary with git-related information, like branch and commit ID. The dictionary key is a string that can be used as a tag on an AzureML run, the dictionary value is the git information. If git information is passed in via commandline arguments, those take precedence over information read out from the repository.

Parameters:: azure_config – An AzureConfig object specifying git-related commandline args.
Returns:: A dictionary mapping from tag name to git info.

InnerEye.Azure.azure_runner.parse_args_and_add_yaml_variables(parser: ArgumentParser, yaml_config_file: Optional[Path] = None, project_root: Optional[Path] = None, fail_on_unknown_args: bool = False) → ParserResult[source]

Reads arguments from sys.argv, modifies them with secrets from local YAML files, and parses them using the given argument parser.

Parameters:

project_root – The root folder for the whole project. Only used to access a private settings file.
parser – The parser to use.
yaml_config_file – The path to the YAML file that contains values to supply into sys.argv.
fail_on_unknown_args – If True, raise an exception if the parser encounters an argument that it does not recognize. If False, unrecognized arguments will be ignored, and added to the “unknown” field of the parser result.

Returns:

The parsed arguments, and overrides

InnerEye.Azure.azure_runner.parse_arguments(parser: ArgumentParser, settings_from_yaml: Optional[Dict[str, Any]] = None, fail_on_unknown_args: bool = False, args: Optional[List[str]] = None) → ParserResult[source]

Parses a list of commandline arguments with a given parser, and adds additional information read from YAML files. Returns results broken down into a full arguments dictionary, a dictionary of arguments that were set to non-default values, and unknown arguments.

Parameters:

parser – The parser to use
settings_from_yaml – A dictionary of settings read from a YAML config file.
fail_on_unknown_args – If True, raise an exception if the parser encounters an argument that it does not recognize. If False, unrecognized arguments will be ignored, and added to the “unknown” field of the parser result.
args – Arguments to parse. If not given, use those in sys.argv

Returns:

The parsed arguments, and overrides

InnerEye.Azure.azure_runner.run_duration_string_to_seconds(s: str) → Optional[int][source]

Parse a string that represents a timespan, and returns it converted into seconds. The string is expected to be floating point number with a single character suffix s, m, h, d for seconds, minutes, hours, day. Examples: ‘3.5h’, ‘2d’. If the argument is an empty string, None is returned.

Parameters:: s – The string to parse.
Returns:: The timespan represented in the string converted to seconds.

InnerEye.Azure.azure_runner.set_environment_variables_for_multi_node() → None[source]: Sets the environment variables that PyTorch Lightning needs for multi-node training.

Configuration

class InnerEye.Azure.azure_config.AzureConfig(**params: Any)[source]

Azure related configurations to set up valid workspace. Note that for a parameter to be settable (when not given on the command line) to a value from settings.yml, its default here needs to be None and not the empty string, and its type will be Optional[str], not str.

application_id: str = ''

azureml: bool = False

azureml_datastore: str = ''

build_branch: str = ''

build_number: int = 0

build_source_author: str = ''

build_source_id: str = ''

build_source_message: str = ''

build_source_repository: str = ''

build_user: str = 'docs'

build_user_email: str = 'docs'

cluster: str = ''

docker_shm_size: str = '440g'

experiment_name: str = ''

extra_code_directory: str = ''

fetch_run(run_recovery_id: str) → Run[source]

Gets an instantiated Run object for a given run recovery ID (format experiment_name:run_id).

Parameters:: run_recovery_id – A run recovery ID (format experiment_name:run_id)

static from_yaml(yaml_file_path: Path, project_root: Optional[Path]) → AzureConfig[source]

Creates an AzureConfig object with default values, with the keys/secrets populated from values in the: given YAML file. If a project_root folder is provided, a private settings file is read from there as well.

Parameters:

yaml_file_path – Path to the YAML file that contains values to create the AzureConfig
project_root – A folder in which to search for a private settings file.

Returns:

AzureConfig with values populated from the yaml files.

get_git_information() → GitInformation[source]: Gets all version control information about the present source code in the project_root_directory. Information is taken from commandline arguments, or if not given there, retrieved from git directly. The result of the first call to this function is cached, and returned in later calls.

get_service_principal_auth() → Optional[Union[InteractiveLoginAuthentication, ServicePrincipalAuthentication]][source]: Creates a service principal authentication object with the application ID stored in the present object. The application key is read from the environment. :return: A ServicePrincipalAuthentication object that has the application ID and key or None if the key

is not present

get_workspace() → Workspace[source]: Return a workspace object for an existing Azure Machine Learning Workspace (or default from YAML). When running inside AzureML, the workspace that is retrieved is always the one in the current run context. When running outside AzureML, it is created or accessed with the service principal. This function will read the workspace only in the first call to this method, subsequent calls will return a cached value. Throws an exception if the workspace doesn’t exist or the required fields don’t lead to a uniquely identifiable workspace. :return: Azure Machine Learning Workspace

hyperdrive: bool = False

log_level: str = 'INFO'

max_run_duration: str = ''

model: str = ''

name = 'AzureConfig'

num_nodes: int = 1

only_register_model: bool = False

param = <param.parameterized.Parameters object>

pip_extra_index_url: str = ''

project_root: Path = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/innereye-deeplearning/checkouts/latest')

pytest_mark: str = ''

resource_group: str = ''

run_recovery_id: str = ''

subscription_id: str = ''

tag: str = ''

tenant_id: str = ''

tensorboard: bool = False

train: bool = True

use_dataset_mount: bool = False

validate() → None[source]: Validation method called directly after init to be overridden by children if required

wait_for_completion: bool = False

workspace_name: str = ''

class InnerEye.Azure.azure_config.GitInformation(repository: str, branch: str, commit_id: str, commit_message: str, commit_author: str, is_dirty: bool)[source]

Contains information about the git repository that was used to submit the present experiment.

branch: str

commit_author: str

commit_id: str

commit_message: str

is_dirty: bool

repository: str

class InnerEye.Azure.azure_config.ParserResult(args: Dict[str, Any], unknown: List[str], overrides: Dict[str, Any], known_settings_from_yaml: Dict[str, Any], unknown_settings_from_yaml: Dict[str, Any])[source]

Stores the results of running an argument parser, broken down into a argument-to-value dictionary, arguments that the parser does not recognize, and settings that were read from YAML files.

args: Dict[str, Any]

known_settings_from_yaml: Dict[str, Any]

overrides: Dict[str, Any]

unknown: List[str]

unknown_settings_from_yaml: Dict[str, Any]

class InnerEye.Azure.azure_config.SourceConfig(root_folder: ~pathlib.Path, entry_script: ~pathlib.Path, conda_dependencies_files: ~typing.List[~pathlib.Path], script_params: ~typing.List[str] = <factory>, hyperdrive_config_func: ~typing.Optional[~typing.Callable[[~azureml.core.script_run_config.ScriptRunConfig], ~azureml.train.hyperdrive.runconfig.HyperDriveConfig]] = None, upload_timeout_seconds: int = 36000, environment_variables: ~typing.Optional[~typing.Dict[str, str]] = None)[source]

Contains all information that is required to submit a script to AzureML: Entry script, arguments, and information to set up the Python environment inside of the AzureML virtual machine.

conda_dependencies_files: List[Path]

entry_script: Path

environment_variables: Optional[Dict[str, str]] = None

hyperdrive_config_func: Optional[Callable[[ScriptRunConfig], HyperDriveConfig]] = None

root_folder: Path

script_params: List[str]

upload_timeout_seconds: int = 36000

Utils

InnerEye.Azure.azure_util.download_run_output_file(blob_path: Path, destination: Path, run: Run) → Path[source]

Downloads a single file from the run’s default output directory: DEFAULT_AML_UPLOAD_DIR (“outputs”). For example, if blobs_path = “foo/bar.csv”, then the run result file “outputs/foo/bar.csv” will be downloaded to <destination>/bar.csv (the directory will be stripped off).

Parameters:

blob_path – The name of the file to download.
run – The AzureML run to download the files from
destination – Local path to save the downloaded blob to.

Returns:

Destination path to the downloaded file(s)

InnerEye.Azure.azure_util.download_run_outputs_by_prefix(blobs_prefix: Path, destination: Path, run: Run) → None[source]

Download all the blobs from the run’s default output directory: DEFAULT_AML_UPLOAD_DIR (“outputs”) that have a given prefix (folder structure). When saving, the prefix string will be stripped off. For example, if blobs_prefix = “foo”, and the run has a file “outputs/foo/bar.csv”, it will be downloaded to destination/bar.csv. If there is in addition a file “foo.txt”, that file will be skipped.

Parameters:

blobs_prefix – The prefix for all files in “outputs” that should be downloaded.
run – The AzureML run to download the files from.
destination – Local path to save the downloaded blobs to.

InnerEye.Azure.azure_util.fetch_child_runs(run: Run, status: Optional[str] = None, expected_number_cross_validation_splits: int = 0) → List[Run][source]

Fetch child runs for the provided runs that have the provided AML status (or fetch all by default) and have a run_recovery_id tag value set (this is to ignore superfluous AML infrastructure platform runs).

Parameters:

run – parent run to fetch child run from
status – if provided, returns only child runs with this status
expected_number_cross_validation_splits – when recovering child runs from AML hyperdrive sometimes the get_children function fails to retrieve all children. If the number of child runs

retrieved by AML is lower than the expected number of splits, we try to retrieve them manually.

InnerEye.Azure.azure_util.fetch_run(workspace: Workspace, run_recovery_id: str) → Run[source]

Finds an existing run in an experiment, based on a recovery ID that contains the experiment ID and the actual RunId. The run can be specified either in the experiment_name:run_id format, or just the run_id.

Parameters:

workspace – the configured AzureML workspace to search for the experiment.
run_recovery_id – The Run to find. Either in the full recovery ID format, experiment_name:run_id or just the run_id

Returns:

The AzureML run.

InnerEye.Azure.azure_util.fetch_runs(experiment: Experiment, filters: List[str]) → List[Run][source]

Fetch the runs in an experiment.

Parameters:

experiment – the experiment to fetch runs from
filters – a list of run status to include. Must be subset of [Running, Completed, Failed, Canceled].

Returns:

the list of runs in the experiment

InnerEye.Azure.azure_util.get_all_environment_files(project_root: Path) → List[Path][source]

Returns a list of all Conda environment files that should be used. This is firstly the InnerEye conda file, and possibly a second environment.yml file that lives at the project root folder.

Parameters:: project_root – The root folder of the code that starts the present training run.
Returns:: A list with 1 or 2 entries that are conda environment files.

InnerEye.Azure.azure_util.get_comparison_baseline_paths(outputs_folder: Path, blob_path: Path, run: Run, dataset_csv_file_name: str) → Tuple[Optional[Path], Optional[Path]][source]

InnerEye.Azure.azure_util.get_cross_validation_split_index(run: Run) → int[source]

Gets the cross validation index from the run’s tags or returns the default

Parameters:: run – Run context from which to get index
Returns:: The cross validation split index

InnerEye.Azure.azure_util.get_run_context_or_default(run: Optional[Run] = None) → Run[source]

Returns the context of the run, if run is not None. If run is None, returns the context of the current run.

Parameters:: run – Run to retrieve context for. If None, retrieve ocntext of current run.
Returns:: Run context

InnerEye.Azure.azure_util.is_cross_validation_child_run(run: Run) → bool[source]

Checks the provided run’s tags to determine if it is a cross validation child run (which is the case if the split index >=0)

Parameters:: run – Run to check.
Returns:: True if cross validation run. False otherwise.

InnerEye.Azure.azure_util.is_ensemble_run(run: Run) → bool[source]: Checks if the run was an ensemble of multiple models

InnerEye.Azure.azure_util.is_offline_run_context(run_context: Run) → bool[source]

Tells if a run_context is offline by checking if it has an experiment associated with it.

Parameters:: run_context – Context of the run to check
Returns:

InnerEye.Azure.azure_util.is_parent_run(run: Run) → bool[source]

InnerEye.Azure.azure_util.is_running_on_azure_agent() → bool[source]: Returns True if the code appears to be running on an Azure build agent, and False otherwise.

InnerEye.Azure.azure_util.split_recovery_id(id: str) → Tuple[str, str][source]

Splits a run ID into the experiment name and the actual run. The argument can be in the format ‘experiment_name:run_id’, or just a run ID like user_branch_abcde12_123. In the latter case, everything before the last two alphanumeric parts is assumed to be the experiment name.

Parameters:: id –
Returns:: experiment name and run name

InnerEye.Azure.azure_util.step_up_directories(path: Path) → Generator[Path, None, None][source]: Generates the provided directory and all its parents. Needed because dataset.csv files are sometimes not where we expect them to be, but higher up.

InnerEye.Azure.azure_util.strip_prefix(string: str, prefix: str) → str[source]

Returns the string without the prefix if it has the prefix, otherwise the string unchanged.

Parameters:

string – Input string.
prefix – Prefix to remove from input string.

Returns:

Input string with prefix removed.

InnerEye.Azure.azure_util.tag_values_all_distinct(runs: List[Run], tag: str) → bool[source]: Returns True iff the runs all have the specified tag and all the values are different.

InnerEye.Azure.azure_util.to_azure_friendly_container_path(path: Path) → str[source]

Converts a path an Azure friendly container path by replacing “", “//” with “/” so it can be in the form: a/b/c.

Parameters:: path – Original path
Returns:: Converted path

InnerEye.Azure.azure_util.to_azure_friendly_string(x: Optional[str]) → Optional[str][source]: Given a string, ensure it can be used in Azure by replacing everything apart from a-zA-Z0-9_ with _, and replace multiple _ with a single _.

InnerEye.Azure.parser_util.value_to_string(x: object) → str[source]

Returns a string representation of x, with special treatment of Enums (return their value) and lists (return comma-separated list).

Parameters:: x – Object to convert to string
Returns:: The string representation of the object.

Special cases: For Enums, returns their value, for lists, returns a comma-separated list.

class InnerEye.Azure.secrets_handling.SecretsHandling(project_root: Path)[source]

Contains method to read secrets from environment variables and/or files on disk.

get_secret_from_environment(name: str, allow_missing: bool = False) → Optional[str][source]

Gets a password or key from the secrets file or environment variables.

Parameters:

name – The name of the environment variable to read. It will be converted to uppercase.
allow_missing – If true, the function returns None if there is no entry of the given name in any of the places searched. If false, missing entries will raise a ValueError.

Returns:

Value of the secret. None, if there is no value and allow_missing is True.

get_secrets_from_environment_or_file(secrets_to_read: List[str]) → Dict[str, Optional[str]][source]

Attempts to read secrets from the project secret file. If there is no secrets file, it returns all secrets in secrets_to_read read from environment variables. When reading from environment, if an expected secret is not found, its value will be None.

Parameters:: secrets_to_read – The list of secret names to read from the YAML file. These will be converted to uppercase.

read_secrets_from_file(secrets_to_read: List[str]) → Optional[Dict[str, str]][source]

Reads the secrets from file in YAML format, and returns the contents as a dictionary. The YAML file is expected in the project root directory.

Parameters:: secrets_to_read – The list of secret names to read from the YAML file. These will be converted to uppercase.
Returns:: A dictionary with secrets, or None if the file does not exist.

InnerEye.Azure.secrets_handling.read_all_settings(project_settings_file: Optional[Path] = None, project_root: Optional[Path] = None) → Dict[str, Any][source]

Reads settings from files in YAML format, and returns the union of settings found. The first settings file to read is project_settings_file. The second settings file is ‘InnerEyePrivateSettings.yml’ expected in

the project_root folder. Settings in the private settings file

override those in the project settings. Both settings files are expected in YAML format, with an entry called ‘variables’.

Parameters:

project_settings_file – The first YAML settings file to read.
project_root – The folder that can contain a ‘InnerEyePrivateSettings.yml’ file.

Returns:

A dictionary mapping from string to variable value. The dictionary key is the union of variable names found in the two settings files.

InnerEye.Azure.secrets_handling.read_settings_and_merge(project_settings_file: Optional[Path] = None, private_settings_file: Optional[Path] = None) → Dict[str, Any][source]

Reads settings from files in YAML format, and returns the union of settings found. First, the project settings file is read into a dictionary, then the private settings file is read. Settings in the private settings file override those in the project settings. Both settings files are expected in YAML format, with an entry called ‘variables’.

Parameters:

project_settings_file – The first YAML settings file to read.
private_settings_file – The second YAML settings file to read. Settings in this file has higher priority.

Returns:

A dictionary mapping from string to variable value. The dictionary key is the union of variable names found in the two settings files.

InnerEye.Azure.secrets_handling.read_settings_yaml_file(yaml_file: Path) → Dict[str, Any][source]

Reads a YAML file, that is expected to contain an entry ‘variables’. Returns the dictionary for the ‘variables’ section of the file.

Parameters:: yaml_file – The yaml file to read.
Returns:: A dictionary with the variables from the yaml file.