.. _quickstart-chapter:

Quick start guide
=================

.. _installation-chapter:

Installation
------------

You can install WORC either using pip, or from the source code. We strongly advice you to install ``WORC`` in a `virtualenv <http://docs.python-guide.org/en/latest/dev/virtualenvs/>`_.


**Installing via pip**


You can simply install WORC using ``pip``:

.. code-block:: bash

    pip install WORC

**Installing from source code**

To install from source code, use git via the command-line:

.. code-block:: bash

    git clone https://github.com/MStarmans91/WORC.git  # for http
    git clone ssh://git@github.com:MStarmans91/WORC.git # for ssh

**Windows installation**

On Windows, we strongly recommend to install python through the
`Anaconda distribution <https://www.anaconda.com/distribution/#windows>`_.

Regardless of your installation, you will need `Microsoft Visual Studio <https://visualstudio.microsoft.com/vs/features/python>`_: the Community
edition can be downloaded and installed for free.

If you still get an error similar to error: ``Microsoft Visual C++ 14.0 is required. Get it with``
`Microsoft Visual C++ Build Tools   <https://visualstudio.microsoft.com/downloads/#build-tools-for-visual-studio-2019>`_
, please follow the respective link and install the requirements.


Tutorials
---------
To start out using WORC, we strongly recommend you to follow the tutorials located in the
`WORCTutorial Github <https://github.com/MStarmans91/WORCTutorial/>`_. This repository
contains tutorials for an introduction to WORC, as well as more advanced workflows. We recommend
starting with the WORCTutorialSimple, of which the part below is an exact copy.

If you run into any issue, you can first debug your network using
`the fastr trace tool <https://fastr.readthedocs.io/en/stable/static/user_manual.html#debugging-a-network-run-with-errors/>`_.
If you're stuck, check out the FAQ first at https://worc.readthedocs.io/en/latest/static/faq.html,
or feel free to post an issue on the `WORC Github <https://github.com/MStarmans91/WORC/>`_.


Running an experiment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We strongly recommend you to follow the tutorials, see the section above. In this section,
a point by point description of the tutorial is given.

Below is the same script as found in the SimpleWORC tutorial found in the `WORCTutorial Github <https://github.com/MStarmans91/WORCTutorial/>`_.
In this tutorial, we will make use of the ``SimpleWORC`` facade, which simplifies interacting with ``WORC``.
Additional information on ``SimpleWORC`` can be found in the
:ref:`User Manual <usermanual-chapter>`. This chapter also includes the documentation on using the ``BasicWORC`` facade,
which is slightly more advanced, and the ``WORC`` object directly, which provides the most advanced options.

Import packages
```````````````

First, import WORC and some additional python packages.

.. code-block:: python

    from WORC import SimpleWORC
    import os

    # These packages are only used in analysing the results
    import pandas as pd
    import json
    import fastr
    import glob

    # If you don't want to use your own data, we use the following example set,
    # see also the next code block in this example.
    from WORC.exampledata.datadownloader import download_HeadAndNeck

    # Define the folder this script is in, so we can easily find the example data
    script_path = os.path.dirname(os.path.abspath(__file__))

    # Determine whether you would like to use WORC for binary_classification,
    # multiclass_classification or regression
    modus = 'binary_classification'

Input
`````
The minimal inputs to WORC are:

    1. Images
    2. Segmentations
    3. Labels

In SimpleWORC, we assume you have a folder "datadir", in which there is a
folder for each patient, where in each folder there is a image.nii.gz and a mask.nii.gz:

    * Datadir

      * Patient_001
          * image.nii.gz
          * mask.nii.gz
      * Patient_002
          * image.nii.gz
          * mask.nii.gz
      * ...

In the example, we will use open source data from the online
`BMIA XNAT platform <https://xnat.bmia.nl/data/archive/projects/stwstrategyhn1/>`_
This dataset consists of CT scans of patients with Head and Neck tumors. We will download
a subset of 20 patients in this folder. You can change this settings if you like.

.. code-block:: python

    nsubjects = 20  # use "all" to download all patients
    data_path = os.path.join(script_path, 'Data')
    download_HeadAndNeck(datafolder=data_path, nsubjects=nsubjects)

.. note:: You can skip this code block if you use your own data.

Identify our data structure: change the fields below accordingly if you use your own dataset.

.. code-block:: python

    imagedatadir = os.path.join(data_path, 'stwstrategyhn1')
    image_file_name = 'image.nii.gz'
    segmentation_file_name = 'mask.nii.gz'

    # File in which the labels (i.e. outcome you want to predict) is stated
    # Again, change this accordingly if you use your own data.
    label_file = os.path.join(data_path, 'Examplefiles', 'pinfo_HN.csv')

    # Name of the label you want to predict
    if modus == 'binary_classification':
        # Classification: predict a binary (0 or 1) label
        label_name = ['imaginary_label_1']

    elif modus == 'regression':
        # Regression: predict a continuous label
        label_name = ['Age']

    elif modus == 'multiclass_classification':
        # Multiclass classification: predict several mutually exclusive binaru labels together
        label_name = ['imaginary_label_1', 'complement_label_1']

    # Determine whether we want to do a coarse quick experiment, or a full lengthy
    # one. Again, change this accordingly if you use your own data.
    coarse = True

    # Give your experiment a name
    experiment_name = 'Example_STWStrategyHN'

    # Instead of the default tempdir, let's but the temporary output in a subfolder
    # in the same folder as this script
    tmpdir = os.path.join(script_path, 'WORC_' + experiment_name)

The actual experiment
`````````````````````

After defining the inputs, the following code can be used to run your first experiment.

.. code-block:: python

    # Create a Simple WORC object
    experiment = SimpleWORC(experiment_name)

    # Set the input data according to the variables we defined earlier
    experiment.images_from_this_directory(imagedatadir,
                                 image_file_name=image_file_name,
                                 is_training=True)
    experiment.segmentations_from_this_directory(imagedatadir,
                                        segmentation_file_name=segmentation_file_name,
                                        is_training=True) 
    experiment.labels_from_this_file(label_file)
    experiment.predict_labels(label_name)

    # Set the types of images WORC has to process. Used in fingerprinting
    # Valid quantitative types are ['CT', 'PET', 'Thermography', 'ADC']
    # Valid qualitative types are ['MRI', 'DWI', 'US']
    experiment.set_image_types(['CT'])

    # Use the standard workflow for your specific modus
    if modus == 'binary_classification':
        experiment.binary_classification(coarse=coarse)
    elif modus == 'regression':
        experiment.regression(coarse=coarse)
    elif modus == 'multiclass_classification':
        experiment.multiclass_classification(coarse=coarse)

    # Set the temporary directory
    experiment.set_tmpdir(tmpdir)

    # Run the experiment!
    experiment.execute()

.. note::  Precomputed features can be used instead of images and masks by instead using ``experiment.features_from_this_directory(featuresdatadir)`` in a similar fashion.

Analysis of the results
```````````````````````
There are two main outputs: the features for each patient/object, and the overall
performance. These are stored as .hdf5 and .json files, respectively. By
default, they are saved in the so-called "fastr output mount", in a subfolder
named after your experiment name.

.. code-block:: python

    # Locate output folder
    outputfolder = fastr.config.mounts['output']
    experiment_folder = os.path.join(outputfolder, 'WORC_' + experiment_name)

    print(f"Your output is stored in {experiment_folder}.")

    # Read the features for the first patient
    # NOTE: we use the glob package for scanning a folder to find specific files
    feature_files = glob.glob(os.path.join(experiment_folder,
                                           'Features',
                                           'features_*.hdf5'))

    if len(feature_files) == 0:
        raise ValueError('No feature files found: your network has failed.')

    feature_files.sort()
    featurefile_p1 = feature_files[0]
    features_p1 = pd.read_hdf(featurefile_p1)

    # Read the overall peformance
    performance_file = os.path.join(experiment_folder, 'performance_all_0.json')
    if not os.path.exists(performance_file):
        raise ValueError(f'No performance file {performance_file} found: your network has failed.')

    with open(performance_file, 'r') as fp:
        performance = json.load(fp)

    # Print the feature values and names
    print("Feature values from first patient:")
    for v, l in zip(features_p1.feature_values, features_p1.feature_labels):
        print(f"\t {l} : {v}.")

    # Print the output performance
    print("\n Performance:")
    stats = performance['Statistics']
    del stats['Percentages']  # Omitted for brevity
    for k, v in stats.items():
        print(f"\t {k} {v}.")

.. note:: The performance is probably horrible, which is expected as we ran the experiment on coarse settings. These settings are recommended to only use for testing: see also below.


Tips and Tricks
```````````````

For tips and tricks on running a full experiment instead of this simple
example, adding more evaluation options, debugging a crashed network etcetera,
please go to 
We advice you to look at the docstrings of the SimpleWORC functions
introduced in this tutorial, and explore the other SimpleWORC functions,
s SimpleWORC offers much more functionality than presented here.

For tips and tricks on running a full experiment instead of this simple
example, adding more evaluation options, debugging a crashed network etcetera,
please go to :ref:`User Manual <usermanual-chapter>` chapter or
the :ref:`Additional functionality <additonalfunctionality-chapter>` chapter. If you
run into any issues, check the :ref:`FAQ <faq-chapter>`,
make an issue on the WORC Github, or feel free to mail me.

We advice you to look at the docstrings of the SimpleWORC functions
introduced in this tutorial, and explore the other SimpleWORC functions,
as SimpleWORC offers much more functionality than presented here, see
the documentation: https://worc.readthedocs.io/en/latest/autogen/WORC.facade.html#WORC.facade.simpleworc.SimpleWORC

Some things we would advice to always do:

* Run actual experiments on the full settings (coarse=False):

    .. code-block:: python

        coarse = False
        experiment.binary_classification(coarse=coarse)

.. note:: This will result in more computation time. We therefore recommmend
  to run this script on either a cluster or high performance PC. If so,
  you may change the execution to use multiple cores to speed up computation
  just before before ``experiment.execute()``:

    .. code-block:: python

        experiment.set_multicore_execution()

  This is not required when running WORC on the BIGR or SURFSara Cartesius cluster,
  as automatic detectors for these clusters have been built into SimpleWORC and BasicWORC.

* Add extensive evaluation: ``experiment.add_evaluation()`` before ``experiment.execute()``:

    .. code-block:: python

        experiment.add_evaluation()

  See the "Outputs and evaluation of your network" section in the :ref:`User Manual <usermanual-chapter>`
  chapter for more details on the evaluation outputs.

Changing fields in the configuration can be done with the add_config_overrides function, see below. 
We recommend doing this after the modus part, as these also perform config_overrides.
NOTE: all configuration fields have to be provided as strings.

    .. code-block:: python
            
        overrides = {
                'Classification': {
                    'classifiers': 'SVM',
                    },
                }

        experiment.add_config_overrides(overrides)

  For a complete overview of all configuration functions, please look at the
  :ref:`Config chapter <config-chapter>`.