Additional functionality

When using SimpleWORC, or WORC with similar simple configuration settings, you can already benefit from the main functionality of WORC, i.e. the automatic algorithm optimization. However, several additional functionalities are provided, which are discussed in this chapter.

For a description of the radiomics features, please see the radiomics features chapter. For a description of the data mining components, see the data mining chapter. All other components are discussed here.

For a comprehensive overview of all functions and parameters, please look at the config chapter.

Image Preprocessing

Preprocessing of the image, and accordingly the mask, is done in respectively the WORC.processing.preprocessing and the WORC.processing.segmentix scripts. Options for preprocessing the image include, in the following order:

  1. N4 Bias field correction, see also https://simpleitk.readthedocs.io/en/master/link_N4BiasFieldCorrection_docs.html.

  2. Checking and optionally correcting the spacing if it’s 1x1x1 and the DICOM metadata says otherwise.

  3. Clipping of the image intensities above and below a certain value.

  4. Normalization, see WORC.processing.preprocessing.normalize_image for all options.

  5. Transposing the image to another ‘’main’’ orientation, e.g. axial.

  6. Resampling the image to a different spacing.

Options for preprocessing the segmentation include:

  1. Hole filling. Many feature computations cannot deal with holes.

  2. Removing small objects. Many feature computations cannot deal with multiple

objects in a single segmentation.

  1. Extracing the largest blob. Many feature computations cannot deal with

multiple objects in a single segmentation.

  1. Instead of using the full segmentation, extracting a ring around the border

of the image to compute the features on. Ring captures both the inner and outer border.

  1. Dilating the contour.

  2. Masking the contour with another contour.

  3. When assuming the same image and metadata, copy the metadata of the image

to the segmentation.

  1. Checking and optionally correcting the spacing if it’s 1x1x1 and the

DICOM metadata says otherwise. Same as image preprocessing step 2.

  1. Transposing the segmentation to another ‘’main’’ orientation, e.g. axial.

Same as image preprocessing step 5.

  1. Resampling the segmentation and the segmentation to a different spacing.

Same as image preprocessing step 10.

Feature scaling

The default method for feature scaling in WORC is a robust version of z-scoring. Additional options include:

  1. regular z-scoring

  2. MinMax scaling, i.e., scaling to a range between 0 and 1

  3. Scaling by centering using the median and IQR

  4. A combination of z-scoring with a logarithmic transform and a correction term to better cope with outliers and non-normally distributed features [CIT1].

Image Registration

When using multiple modalities or sequences, and there is only a segmentation on a single image, image registration is applied to spatially align all sequences and warp the segmentation to the other images through elastix [CIT2]. Usage of elastix is automatically included in WORC when only a single segmentation and multiple modalities are supplied. The image on which the segmentation is provided is used as the moving image, the others as the fixed image, as the segmentations will be moved from the segmented image to the others.

Registration is by default performed using a rigid transformation model, based on a mutual information using the adaptive stochastic gradient descent optimizer. Manual overrides of these defaults are included in the WORC configuration.

When using Elastix, parameter files have to be provided in the network.Elastix_Para object, e.g.

network.Elastix_Para = [['Parameters_Rigid.txt', 'Parameters_BSpline.txt']]

The outer list defines the parameter files used per modality. If only one element is provided, the same will be applied for all modalities. Each element of the list should be a list of its own, including the filenames of elastix. In the example, we provided two files, resulting in first a rigid registration being performed, followed by a bspline registration. Examples of elastix parameter files can be found at https://github.com/SuperElastix/ElastixModelZoo/tree/master/models/default

ComBat

Commonly, radiomics studies include multicenter data, resulting in heterogeneity in the acquisition protocols. As radiomics features are generally sensitive to these variations, this limits the repeatability and reproducibility. To compensate for the differences in acquisition, feature harmonization techniques may be used, one of the most frequently used is ComBat. In ComBat, feature distributions are harmonized for variations in the imaging acquisition, e.g. due to differences in hospitals, manufacturers, or acquisition parameters. The dataset is divided in groups based on these differences, and a correction of the error caused by these differences is estimated using empirical Bayes.

ComBat is included in WORC and can be turned on in the configuration, including options to use empirical Bayes or not, a parametric or non-parametric approach, and a moderation variable.

ComBat feature harmonization is embedded in WORC. A wrapper around the original ComBat code, compatible with the other tools provided by WORC, is included in the WORC installation.

When using ComBat, the following configurations should be done:

  1. Set config['General']['ComBat'] to 'True'.

  2. To change the ComBat parameters (i.e. which batch and moderation variable to use), change the relevant config fields, see the Config chapter.

  3. WORC extracts the batch and moderation variables from the label file which you also use to give WORC the actual label you want to predict. The same format therefore applies, see the User manual for more details..

Note

In line with current literature, ComBat is applied once on the full dataset straight after the feature extraction, thus before the actual hyperoptimization. Hence, to avoid serious overfitting, we advice to NEVER use the variable you are trying to predict as the moderation variable.

Multilabel classification and regression

While WORC was primarily designed for binary classification, as also demonstrated in the main manuscript, various other types of machine learning workflows have been included as well.

In multilabel classification, several mutually exclusive classes are predicted at the same time. This is a special form of multiclass classification, in which the classes do not have to be mutually exclusive. When using multilabel classification in WORC, the only differences with binary classification in the workflows is in the machine learning component. For the other components, e.g. feature selection and resampling, when not supporting multiclass classification, the methods are performed per class in a one-vs-rest approach. Some of the binary classifiers naturally support multilabel classification (i.e., random forest, AdaBoost, and extreme gradient boosting) and are thus normally used. Others only support binary classification (i.e., LDA, QDA, Naive Bayes, SVM, logistic regression), and are therefore also performed per class in a one-vs-rest approach and combined in a single multilabel model. In the evaluation, the same metrics as in the binary classification are evaluated per class. Additionally, the multiclass AUC [CIT3]. and multiclass BCR are computed.

In regression, a continuous label is predicted. As there are no classes, all class-based feature and sample preprocessing methods (RELIEF, univariate testing, and all resampling methods) cannot be used. In the machine learning component, WORC includes the following regressors:

  1. linear regression;

  2. support vector machines;

  3. random forest;

  4. elastic net;

  5. LASSO;

  6. ridge regression;

  7. AdaBoost;

  8. extreme gradient boosting (XGBoost).

The optimization is by default based on the R2-score. Performance metrics computed are the rw-score, mean squared error, inter-class correlation coefficient, Pearson coefficient and p-value, and Spearman coefficient and p-value.

References

CIT1

Chen, Jianan, et al. AMINN: Autoencoder-based Multiple Instance Neural Network for Outcome Prediction of Multifocal Liver Metastases. arXiv preprint arXiv:2012.06875 (2020).

CIT2

Klein, Stefan, et al. Elastix: a toolbox for intensity-based medical image registration. IEEE transactions on medical imaging 29.1 (2009): 196-205.

CIT3

Hand, David J., and Robert J. Till. A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine learning 45.2 (2001): 171-186.