Runner

This script is the entry point for running all InnerEye jobs, both locally and in AzureML. See the user guides for example usage.

usage: runner.py [-h] [--subscription_id SUBSCRIPTION_ID]
                 [--tenant_id TENANT_ID] [--application_id APPLICATION_ID]
                 [--azureml_datastore AZUREML_DATASTORE]
                 [--workspace_name WORKSPACE_NAME]
                 [--resource_group RESOURCE_GROUP]
                 [--docker_shm_size DOCKER_SHM_SIZE]
                 [--hyperdrive [HYPERDRIVE]] [--cluster CLUSTER]
                 [--pip_extra_index_url PIP_EXTRA_INDEX_URL]
                 [--azureml [AZUREML]] [--tensorboard [TENSORBOARD]]
                 [--train TRAIN | --no-train] [--model MODEL]
                 [--only_register_model [ONLY_REGISTER_MODEL]]
                 [--pytest_mark PYTEST_MARK]
                 [--run_recovery_id RUN_RECOVERY_ID]
                 [--experiment_name EXPERIMENT_NAME]
                 [--build_number BUILD_NUMBER] [--build_user BUILD_USER]
                 [--build_user_email BUILD_USER_EMAIL]
                 [--build_source_repository BUILD_SOURCE_REPOSITORY]
                 [--build_branch BUILD_BRANCH]
                 [--build_source_id BUILD_SOURCE_ID]
                 [--build_source_message BUILD_SOURCE_MESSAGE]
                 [--build_source_author BUILD_SOURCE_AUTHOR] [--tag TAG]
                 [--log_level LOG_LEVEL]
                 [--wait_for_completion [WAIT_FOR_COMPLETION]]
                 [--use_dataset_mount [USE_DATASET_MOUNT]]
                 [--extra_code_directory EXTRA_CODE_DIRECTORY]
                 [--project_root PROJECT_ROOT]
                 [--max_run_duration MAX_RUN_DURATION] [--num_nodes NUM_NODES]
                 [--model_configs_namespace MODEL_CONFIGS_NAMESPACE]

Named Arguments

--subscription_id

The ID of your Azure subscription.

Default: “”

--tenant_id

The Azure tenant ID.

Default: “”

--application_id

Optional: The ID of the Service Principal for authentication to Azure.

Default: “”

--azureml_datastore

The name of the AzureML datastore that holds the input training data. This must be created manually, and point to a folder inside the datasets storage account.

Default: “”

--workspace_name

The name of the AzureML workspace that should be used.

Default: “”

--resource_group

The Azure resource group that contains the AzureML workspace.

Default: “”

--docker_shm_size

The shared memory in the docker image for the AzureML VMs.

Default: “440g”

--hyperdrive

If True, use AzureML HyperDrive for run execution.

Default: False

--cluster

The name of the GPU cluster inside the AzureML workspace, that should execute the job.

Default: “”

--pip_extra_index_url

An additional URL where PIP packages should be loaded from.

Default: “”

--azureml

If True, submit the executing script to run on AzureML.

Default: False

--tensorboard

If True, then automatically launch TensorBoard to monitor the latest submitted AzureML run.

Default: False

--train

If True, train a new model. If False, run inference on an existing model. For inference, you need to specify a –run_recovery_id=… as well.

Default: True

--no-train

Default: True

--model

The name of the model to train/test.

Default: “”

--only_register_model

If set, and run_recovery_id is also set, register the model that was trained in the recovery run, but don’t do trainingor inference.

Default: False

--pytest_mark

If provided, run pytest instead of model training. pytest will only run the tests that have the mark given in this argument (’–pytest_mark gpu’ will run all tests marked with ‘pytest.mark.gpu’)

Default: “”

--run_recovery_id

A run recovery id string in the form ‘experiment name:run id’ to use for inference, recovering a model training run or to register a model.

Default: “”

--experiment_name

If provided, use this string as the name of the AzureML experiment. If not provided, create the experiment off the git branch name.

Default: “”

--build_number

The numeric ID of the Azure pipeline that triggered this training run.

Default: 0

--build_user

The name of the user who started this run.

Default: “docs”

--build_user_email

The email address of the user who started this run. Default: alias of the current user

Default: “docs”

--build_source_repository

The name of the repository this source belongs to.

Default: “”

--build_branch

The branch this experiment has been triggered from.

Default: “”

--build_source_id

The git commit that was used to create this build.

Default: “”

--build_source_message

The message associated with the git commit that was used to create this build.

Default: “”

--build_source_author

The author of the git commit that was used to create this build.

Default: “”

--tag

A string that will be added as a tag to this experiment.

Default: “”

--log_level

The level of diagnostic information that should be printed out to the console.

Default: “INFO”

--wait_for_completion

If true, wait until the AzureML job has completed or failed. If false, submit and exit.

Default: False

--use_dataset_mount

If true, consume an AzureML Dataset via mounting it at job start. If false, consume it by downloading it at job start. When running outside AzureML, datasets will always be downloaded.

Default: False

--extra_code_directory

Directory (relative to project root) containing code (e.g. model config) to be included in the model for inference. Ignored by default.

Default: “”

--project_root

The root folder that contains all code of the project that starts the InnerEye run.

Default: /home/docs/checkouts/readthedocs.org/user_builds/innereye-deeplearning/checkouts/latest

--max_run_duration

The maximum runtime that is allowed for this job when running in AzureML. This is a floating point number with a string suffix s, m, h, d for seconds, minutes, hours, day. Examples: ‘3.5h’, ‘2d’

Default: “”

--num_nodes

The number of virtual machines that will be allocated for thisjob in AzureML.

Default: 1

--model_configs_namespace

Non-default namespace to search for model configs