Quick start guide¶

Installation¶

You can install WORC either using pip, or from the source code. We strongly advice you to install WORC in a virtualenv.

Installing via pip

You can simply install WORC using pip:

pip install WORC

Installing from source code

To install from source code, use git via the command-line:

git clone https://github.com/MStarmans91/WORC.git  # for http
git clone ssh://git@github.com:MStarmans91/WORC.git # for ssh

Windows installation

On Windows, we strongly recommend to install python through the Anaconda distribution.

Regardless of your installation, you will need Microsoft Visual Studio: the Community edition can be downloaded and installed for free.

If you still get an error similar to error: Microsoft Visual C++ 14.0 is required. Get it with Microsoft Visual C++ Build Tools , please follow the respective link and install the requirements.

Tutorials¶

To start out using WORC, we strongly recommend you to follow the tutorials located in the WORCTutorial Github. This repository contains tutorials for an introduction to WORC, as well as more advanced workflows. We recommend starting with the WORCTutorialSimple, of which the part below is an exact copy.

If you run into any issue, you can first debug your network using the fastr trace tool. If you’re stuck, check out the FAQ first at https://worc.readthedocs.io/en/latest/static/faq.html, or feel free to post an issue on the WORC Github.

Running an experiment¶

We strongly recommend you to follow the tutorials, see the section above. In this section, a point by point description of the tutorial is given.

Below is the same script as found in the SimpleWORC tutorial found in the WORCTutorial Github. In this tutorial, we will make use of the SimpleWORC facade, which simplifies interacting with WORC. Additional information on SimpleWORC can be found in the User Manual. This chapter also includes the documentation on using the BasicWORC facade, which is slightly more advanced, and the WORC object directly, which provides the most advanced options.

Import packages¶

First, import WORC and some additional python packages.

from WORC import SimpleWORC
import os

# These packages are only used in analysing the results
import pandas as pd
import json
import fastr
import glob

# If you don't want to use your own data, we use the following example set,
# see also the next code block in this example.
from WORC.exampledata.datadownloader import download_HeadAndNeck

# Define the folder this script is in, so we can easily find the example data
script_path = os.path.dirname(os.path.abspath(__file__))

# Determine whether you would like to use WORC for binary_classification,
# multiclass_classification or regression
modus = 'binary_classification'

Input¶

The minimal inputs to WORC are:

Images

Segmentations

Labels

In SimpleWORC, we assume you have a folder “datadir”, in which there is a folder for each patient, where in each folder there is a image.nii.gz and a mask.nii.gz:

Datadir

Patient_001

image.nii.gz

mask.nii.gz

Patient_002

image.nii.gz

mask.nii.gz

…

In the example, we will use open source data from the online BMIA XNAT platform This dataset consists of CT scans of patients with Head and Neck tumors. We will download a subset of 20 patients in this folder. You can change this settings if you like.

nsubjects = 20  # use "all" to download all patients
data_path = os.path.join(script_path, 'Data')
download_HeadAndNeck(datafolder=data_path, nsubjects=nsubjects)

Note

You can skip this code block if you use your own data.

Identify our data structure: change the fields below accordingly if you use your own dataset.

imagedatadir = os.path.join(data_path, 'stwstrategyhn1')
image_file_name = 'image.nii.gz'
segmentation_file_name = 'mask.nii.gz'

# File in which the labels (i.e. outcome you want to predict) is stated
# Again, change this accordingly if you use your own data.
label_file = os.path.join(data_path, 'Examplefiles', 'pinfo_HN.csv')

# Name of the label you want to predict
if modus == 'binary_classification':
    # Classification: predict a binary (0 or 1) label
    label_name = ['imaginary_label_1']

elif modus == 'regression':
    # Regression: predict a continuous label
    label_name = ['Age']

elif modus == 'multiclass_classification':
    # Multiclass classification: predict several mutually exclusive binaru labels together
    label_name = ['imaginary_label_1', 'complement_label_1']

# Determine whether we want to do a coarse quick experiment, or a full lengthy
# one. Again, change this accordingly if you use your own data.
coarse = True

# Give your experiment a name
experiment_name = 'Example_STWStrategyHN'

# Instead of the default tempdir, let's but the temporary output in a subfolder
# in the same folder as this script
tmpdir = os.path.join(script_path, 'WORC_' + experiment_name)

The actual experiment¶

After defining the inputs, the following code can be used to run your first experiment.

# Create a Simple WORC object
experiment = SimpleWORC(experiment_name)

# Set the input data according to the variables we defined earlier
experiment.images_from_this_directory(imagedatadir,
                             image_file_name=image_file_name,
                             is_training=True)
experiment.segmentations_from_this_directory(imagedatadir,
                                    segmentation_file_name=segmentation_file_name,
                                    is_training=True)
experiment.labels_from_this_file(label_file)
experiment.predict_labels(label_name)

# Set the types of images WORC has to process. Used in fingerprinting
# Valid quantitative types are ['CT', 'PET', 'Thermography', 'ADC']
# Valid qualitative types are ['MRI', 'DWI', 'US']
experiment.set_image_types(['CT'])

# Use the standard workflow for your specific modus
if modus == 'binary_classification':
    experiment.binary_classification(coarse=coarse)
elif modus == 'regression':
    experiment.regression(coarse=coarse)
elif modus == 'multiclass_classification':
    experiment.multiclass_classification(coarse=coarse)

# Set the temporary directory
experiment.set_tmpdir(tmpdir)

# Run the experiment!
experiment.execute()

Note

Precomputed features can be used instead of images and masks by instead using experiment.features_from_this_directory(featuresdatadir) in a similar fashion.

Analysis of the results¶

There are two main outputs: the features for each patient/object, and the overall performance. These are stored as .hdf5 and .json files, respectively. By default, they are saved in the so-called “fastr output mount”, in a subfolder named after your experiment name.

# Locate output folder
outputfolder = fastr.config.mounts['output']
experiment_folder = os.path.join(outputfolder, 'WORC_' + experiment_name)

print(f"Your output is stored in {experiment_folder}.")

# Read the features for the first patient
# NOTE: we use the glob package for scanning a folder to find specific files
feature_files = glob.glob(os.path.join(experiment_folder,
                                       'Features',
                                       'features_*.hdf5'))

if len(feature_files) == 0:
    raise ValueError('No feature files found: your network has failed.')

feature_files.sort()
featurefile_p1 = feature_files[0]
features_p1 = pd.read_hdf(featurefile_p1)

# Read the overall peformance
performance_file = os.path.join(experiment_folder, 'performance_all_0.json')
if not os.path.exists(performance_file):
    raise ValueError(f'No performance file {performance_file} found: your network has failed.')

with open(performance_file, 'r') as fp:
    performance = json.load(fp)

# Print the feature values and names
print("Feature values from first patient:")
for v, l in zip(features_p1.feature_values, features_p1.feature_labels):
    print(f"\t {l} : {v}.")

# Print the output performance
print("\n Performance:")
stats = performance['Statistics']
del stats['Percentages']  # Omitted for brevity
for k, v in stats.items():
    print(f"\t {k} {v}.")

Note

The performance is probably horrible, which is expected as we ran the experiment on coarse settings. These settings are recommended to only use for testing: see also below.

Tips and Tricks¶

For tips and tricks on running a full experiment instead of this simple example, adding more evaluation options, debugging a crashed network etcetera, please go to We advice you to look at the docstrings of the SimpleWORC functions introduced in this tutorial, and explore the other SimpleWORC functions, s SimpleWORC offers much more functionality than presented here.

For tips and tricks on running a full experiment instead of this simple example, adding more evaluation options, debugging a crashed network etcetera, please go to User Manual chapter or the Additional functionality chapter. If you run into any issues, check the FAQ, make an issue on the WORC Github, or feel free to mail me.

We advice you to look at the docstrings of the SimpleWORC functions introduced in this tutorial, and explore the other SimpleWORC functions, as SimpleWORC offers much more functionality than presented here, see the documentation: https://worc.readthedocs.io/en/latest/autogen/WORC.facade.html#WORC.facade.simpleworc.SimpleWORC

Some things we would advice to always do:

Run actual experiments on the full settings (coarse=False):

coarse = False
experiment.binary_classification(coarse=coarse)

Note

This will result in more computation time. We therefore recommmend to run this script on either a cluster or high performance PC. If so, you may change the execution to use multiple cores to speed up computation just before before experiment.execute():

experiment.set_multicore_execution()

This is not required when running WORC on the BIGR or SURFSara Cartesius cluster, as automatic detectors for these clusters have been built into SimpleWORC and BasicWORC.

Add extensive evaluation: experiment.add_evaluation() before experiment.execute():
experiment.add_evaluation()
See the “Outputs and evaluation of your network” section in the User Manual chapter for more details on the evaluation outputs.

Changing fields in the configuration can be done with the add_config_overrides function, see below. We recommend doing this after the modus part, as these also perform config_overrides. NOTE: all configuration fields have to be provided as strings.

overrides = {
        'Classification': {
            'classifiers': 'SVM',
            },
        }

experiment.add_config_overrides(overrides)

For a complete overview of all configuration functions, please look at the Config chapter.