Configuration¶

Introduction¶

WORC has defaults for all settings so it can be run out of the box to test the examples. However, you may want to alter the fastr configuration to your system settings, e.g. to locate your input and output folders and how much you want to parallelize the execution.

Fastr will search for a config file named config.py in the $FASTRHOME directory (which defaults to ~/.fastr/ if it is not set). So if $FASTRHOME is set the ~/.fastr/ will be ignored. Additionally, .py files from the $FASTRHOME/config.d folder will be parsed as well. You will see that upon installation, WORC has already put a WORC_config.py file in the config.d folder.

% Note: Above was originally from quick start As WORC and the default tools used are mostly Python based, we’ve chosen to put our configuration in a configparser object. This has several advantages:

The object can be treated as a python dictionary and thus is easily adjusted.
Second, each tool can be set to parse only specific parts of the configuration, enabling us to supply one file to all tools instead of needing many parameter files.

Creation and interaction¶

The default configuration is generated through the WORC.defaultconfig() function. You can then change things as you would in a dictionary and then append it to the configs source:

>>> network = WORC.WORC('somename')
>>> config = network.defaultconfig()
>>> config['Classification']['classifier'] = 'RF'
>>> network.configs.append(config)

When executing the WORC.set() command, the config objects are saved as .ini files in the WORC.fastr_tempdir folder and added to the WORC.fastrconfigs() source.

Below are some details on several of the fields in the configuration. Note that for many of the fields, we currently only provide one default value. However, when adding your own tools, these fields can be adjusted to your specific settings.

WORC performs Combined Algorithm Selection and Hyperparameter (CASH) optimization. The configuration determines how the optimization is performed and which hyperparameters and models will be included. Repeating specific models/parameters in the config will make them more likely to be used, e.g.

>>> config['Classification']['classifiers'] = 'SVM, SVM, LR'

means that the SVM is 2x more likely to be tested in the model selection than LR.

Note

All fields in the config must either be supplied as strings. A list can be created by using commas for separation, e.g. Network.create_source.

Contents¶

The config object can be indexed as config[key][subkey] = value. The various keys, subkeys, and the values (description, defaults and options) can be found below.

Key	Reference
Bootstrap	Bootstrap
Classification	Classification
ComBat	ComBat
CrossValidation	CrossValidation
Ensemble	Ensemble
Evaluation	Evaluation
FeatPreProcess	FeatPreProcess
Featsel	Featsel
FeatureScaling	FeatureScaling
General	General
HyperOptimization	HyperOptimization
ImageFeatures	ImageFeatures
Imputation	Imputation
Labels	Labels
Normalize	Normalize
PyRadiomics	PyRadiomics
SampleProcessing	SampleProcessing
Segmentix	Segmentix
SelectFeatGroup	SelectFeatGroup

Details on each section of the config can be found below.

General¶

These fields contain general settings for when using WORC. For more info on the Joblib settings, which are used in the Joblib Parallel function, see here. When you run WORC on a cluster with nodes supporting only a single core to be used per node, e.g. the BIGR cluster, use only 1 core and threading as a backend.

Description:

Subkey	Description
cross_validation	Determine whether a cross validation will be performed or not. Obsolete, will be removed.
Segmentix	Determine whether to use Segmentix tool for segmentation preprocessing.
FeatureCalculators	Specifies which feature calculation tools should be used. A list can be provided to use multiple tools.
Preprocessing	Specifies which tool will be used for image preprocessing.
RegistrationNode	Specifies which tool will be used for image registration.
TransformationNode	Specifies which tool will be used for applying image transformations.
Joblib_ncores	Number of cores to be used by joblib for multicore processing.
Joblib_backend	Type of backend to be used by joblib for multicore processing.
tempsave	Determines whether after every cross validation iteration the result will be saved, in addition to the result after all iterations. Especially useful for debugging.
AssumeSameImageAndMaskMetadata	Make the assumption that the image and mask have the same metadata. If True and there is a mismatch, metadata from the image will be copied to the mask.
ComBat	Whether to use ComBat feature harmonization on your FULL dataset, i.e. not in a train-test setting. See <https://github.com/Jfortin1/ComBatHarmonization for more information./>`_ .

Defaults and Options:

Subkey	Default	Options
cross_validation	True	True, False
Segmentix	True	True, False
FeatureCalculators	[predict/CalcFeatures:1.0, pyradiomics/Pyradiomics:1.0]	predict/CalcFeatures:1.0, pyradiomics/Pyradiomics:1.0, pyradiomics/CF_pyradiomics:1.0, your own tool reference
Preprocessing	worc/PreProcess:1.0	worc/PreProcess:1.0, your own tool reference
RegistrationNode	‘elastix4.8/Elastix:4.8’	‘elastix4.8/Elastix:4.8’, your own tool reference
TransformationNode	‘elastix4.8/Transformix:4.8’	‘elastix4.8/Transformix:4.8’, your own tool reference
Joblib_ncores	1	Integer > 0
Joblib_backend	threading	multiprocessing, threading
tempsave	False	True, False
AssumeSameImageAndMaskMetadata	False	True, False
ComBat	False	True, False

Segmentix¶

These fields are only important if you specified using the segmentix tool in the general configuration.

Description:

Subkey	Description
mask	If a mask is supplied, should the mask be subtracted from the contour or multiplied.
segtype	If Ring, then a ring around the segmentation will be used as contour.
segradius	Define the radius of the ring used if segtype is Ring.
N_blobs	How many of the largest blobs are extracted from the segmentation. If None, no blob extraction is used.
fillholes	Determines whether hole filling will be used.
remove_small_objects	Determines whether small objects will be removed.
min_object_size	Minimum of objects in voxels to not be removed if small objects are removed

Defaults and Options:

Subkey	Default	Options
mask	subtract	subtract, multiply
segtype	None	None, Ring
segradius	5	Integer > 0
N_blobs	0	Integer > 0
fillholes	True	True, False
remove_small_objects	False	True, False
min_object_size	2	Integer > 0

Normalize¶

The preprocessing node acts before the feature extraction on the image. Currently, only normalization is included: hence the dictionary name is Normalize. Additionally, scans with image type CT (see later in the tutorial) provided as DICOM are scaled to Hounsfield Units.

Description:

Subkey	Description
ROI	If a mask is supplied and this is set to True, normalize image based on supplied ROI. Otherwise, the full image is used for normalization using the SimpleITK Normalize function. Lastly, setting this to False will result in no normalization being applied.
ROIDetermine	Choose whether a ROI for normalization is provided, or Otsu thresholding is used to determine one.
ROIdilate	Determine whether the ROI has to be dilated with a disc element or not.
ROIdilateradius	Radius of disc element to be used in ROI dilation.
Method	Method used for normalization if ROI is supplied. Currently, z-scoring or using the minimum and median of the ROI can be used.

Defaults and Options:

Subkey	Default	Options
ROI	Full	True, False, Full
ROIDetermine	Provided	Provided, Otsu
ROIdilate	False	True, False
ROIdilateradius	10	Integer > 0
Method	z_score	z_score, minmed

ImageFeatures¶

If using the PREDICT toolbox, you can specify some settings for the feature computation here. Also, you can select if the certain features are computed or not.

Description:

Subkey	Description
shape	Determine whether orientation features are computed or not.
histogram	Determine whether histogram features are computed or not.
orientation	Determine whether orientation features are computed or not.
texture_Gabor	Determine whether Gabor texture features are computed or not.
texture_LBP	Determine whether LBP texture features are computed or not.
texture_GLCM	Determine whether GLCM texture features are computed or not.
texture_GLCMMS	Determine whether GLCM Multislice texture features are computed or not.
texture_GLRLM	Determine whether GLRLM texture features are computed or not.
texture_GLSZM	Determine whether GLSZM texture features are computed or not.
texture_NGTDM	Determine whether NGTDM texture features are computed or not.
coliage	Determine whether coliage features are computed or not.
vessel	Determine whether vessel features are computed or not.
log	Determine whether LoG features are computed or not.
phase	Determine whether local phase features are computed or not.
image_type	Modality of images supplied. Determines how the image is loaded.
gabor_frequencies	Frequencies of Gabor filters used: can be a single float or a list.
gabor_angles	Angles of Gabor filters in degrees: can be a single integer or a list.
GLCM_angles	Angles used in GLCM computation in radians: can be a single float or a list.
GLCM_levels	Number of grayscale levels used in discretization before GLCM computation.
GLCM_distances	Distance(s) used in GLCM computation in pixels: can be a single integer or a list.
LBP_radius	Radii used for LBP computation: can be a single integer or a list.
LBP_npoints	Number(s) of points used in LBP computation: can be a single integer or a list.
phase_minwavelength	Minimal wavelength in pixels used for phase features.
phase_nscale	Number of scales used in phase feature computation.
log_sigma	Standard deviation(s) in pixels used in log feature computation: can be a single integer or a list.
vessel_scale_range	Scale in pixels used for Frangi vessel filter. Given as a minimum and a maximum.
vessel_scale_step	Step size used to go from minimum to maximum scale on Frangi vessel filter.
vessel_radius	Radius to determine boundary of between inner part and edge in Frangi vessel filter.

Defaults and Options:

Subkey	Default	Options
shape	True	True, False
histogram	True	True, False
orientation	True	True, False
texture_Gabor	True	True, False
texture_LBP	True	True, False
texture_GLCM	True	True, False
texture_GLCMMS	True	True, False
texture_GLRLM	False	True, False
texture_GLSZM	False	True, False
texture_NGTDM	False	True, False
coliage	False	True, False
vessel	True	True, False
log	True	True, False
phase	True	True, False
image_type	CT	CT
gabor_frequencies	0.05, 0.2, 0.5	Float(s)
gabor_angles	0, 45, 90, 135	Integer(s)
GLCM_angles	0, 0.79, 1.57, 2.36	Float(s)
GLCM_levels	16	Integer > 0
GLCM_distances	1, 3	Integer(s) > 0
LBP_radius	3, 8, 15	Integer(s) > 0
LBP_npoints	12, 24, 36	Integer(s) > 0
phase_minwavelength	3	Integer > 0
phase_nscale	5	Integer > 0
log_sigma	1, 5, 10	Integer(s)
vessel_scale_range	1, 10	Two integers: min and max.
vessel_scale_step	2	Integer > 0
vessel_radius	5	Integer > 0

PyRadiomics¶

If using the PyRadiomics toolbox, you can specify some settings for the feature computation here. For more information, see https://pyradiomics.readthedocs.io/en/latest/customization.htm.

Description:

Subkey	Description
geometryTolerance	See <https://pyradiomics.readthedocs.io/en/latest/customization.html/>`_ .
normalize	See <https://pyradiomics.readthedocs.io/en/latest/customization.html/>`_ .
normalizeScale	See <https://pyradiomics.readthedocs.io/en/latest/customization.html/>`_ .
interpolator	See <https://pyradiomics.readthedocs.io/en/latest/customization.html?highlight=sitkbspline#feature-extractor-level/>`_ .
preCrop	See <https://pyradiomics.readthedocs.io/en/latest/customization.html/>`_ .
binCount	We advice to use a fixed bin count instead of a fixed bin width, as on imaging modalities such as MRI, the scale of the values varies a lot, which is incompatible with a fixed bin width. See <https://pyradiomics.readthedocs.io/en/latest/customization.html/>`_ .
force2D	See <https://pyradiomics.readthedocs.io/en/latest/customization.html/>`_ .
force2Ddimension	See <https://pyradiomics.readthedocs.io/en/latest/customization.html/>`_ .
voxelArrayShift	See <https://pyradiomics.readthedocs.io/en/latest/customization.html/>`_ .
Original	Enable/Disable computation of original image features.
Wavelet	Enable/Disable computation of wavelet image features.
LoG	Enable/Disable computation of Laplacian of Gaussian (LoG) image features.
label	“Intensity” of the pixels in the mask to be used for feature extraction. If using segmentix, use 1, as your mask will be boolean. Otherwise, select the integer(s) corresponding to the ROI in your mask.
extract_firstorder	Determine whether first order features are computed or not.
extract_shape	Determine whether shape features are computed or not.
texture_GLCM	Determine whether GLCM features are computed or not.
texture_GLRLM	Determine whether GLRLM features are computed or not.
texture_GLSZM	Determine whether GLSZM features are computed or not.
texture_GLDM	Determine whether GLDM features are computed or not.
texture_NGTDM	Determine whether NGTDM features are computed or not.

Defaults and Options:

Subkey	Default	Options
geometryTolerance	0.0001	Float
normalize	False	True, False
normalizeScale	100	Integer
interpolator	sitkBSpline	See <https://pyradiomics.readthedocs.io/en/latest/customization.html?highlight=sitkbspline#feature-extractor-level/>`_ .
preCrop	True	True, False
binCount	16	Integer
force2D	False	True, False
force2Ddimension	0	0 = axial, 1 = coronal, 2 = sagital
voxelArrayShift	300	Integer
Original	True	True, False
Wavelet	False	True, False
LoG	False	True, False
label	1	Integer
extract_firstorder	False	True, False
extract_shape	True	True, False
texture_GLCM	False	True, False
texture_GLRLM	True	True, False
texture_GLSZM	True	True, False
texture_GLDM	True	True, False
texture_NGTDM	True	True, False

ComBat¶

If using the ComBat toolbox, you can specify some settings for the feature harmonization here. For more information, see https://github.com/Jfortin1/ComBatHarmonization.

Description:

Subkey	Description
language	Name of software implementation to use.
batch	Name of batch variable = variable to correct for.
par	Either use the parametric (1) or non-parametric version (0) of ComBat.
eb	Either use the emperical Bayes (1) or simply mean shifting version (0) of ComBat.
per_feature	Either use ComBat for all features combined (0) or per feature (1), in which case a second feature equal to the single feature plus random noise will be added if eb=1
excluded_features	Provide substrings of feature labels of features which should be excluded from ComBat. Recommended to use for features unaffected by the batch variable.
matlab	If using Matlab, path to Matlab executable.
mod	Name of moderation variable(s) = variables for which variation in features will be “preserverd”.

Defaults and Options:

Subkey	Default	Options
language	python	python, matlab
batch	Hospital	String
par	1	0 or 1
eb	1	0 or 1
per_feature	0	0 or 1
excluded_features	sf_, of_, semf_, pf_	List of strings, comma separated
matlab	C:Program FilesMATLABR2015bbinmatlab.exe	String
mod	Label1, Label2	String(s), or []

FeatPreProcess¶

Before the features are given to the classification function, and thus the hyperoptimization, these can be preprocessed as following.

Description:

Subkey	Description
Use	If True, use feature preprocessor in the classify node. Currently excluded features with >80% NaNs.

Defaults and Options:

Subkey	Default	Options
Use	False	Boolean

Featsel¶

When using the PREDICT toolbox for classification, these settings can be used for feature selection methods. Note that these settings are actually used in the hyperparameter optimization. Hence you can provide multiple values per field, of which random samples will be drawn of which finally the best setting in combination with the other hyperparameters is selected. Again, these should be formatted as string containing the actual values, e.g. value1, value2.

Description:

Subkey	Description
Variance	If True, exclude features which have a variance < 0.01. Based on ` sklearn”s VarianceThreshold <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.VarianceThreshold.html/>`_.
GroupwiseSearch	Randomly select which feature groups to use. Parameters determined by the SelectFeatGroup config part, see below.
SelectFromModel	Percentage of times features are selected by first training a LASSO model. The alpha for the LASSO model is randomly generated. See also sklearn”s SelectFromModel.
UsePCA	Percentage of times Principle Component Analysis (PCA) is used to select features.
PCAType	Method to select number of components using PCA: Either the number of components that explains 95% of the variance, or use a fixed number of components.95variance
StatisticalTestUse	Percentage of times a statistical test is used to select features.
StatisticalTestMetric	Define the type of statistical test to be used.
StatisticalTestThreshold	Specify a threshold for the p-value threshold used in the statistical test to select features. The first element defines the lower boundary, the other the upper boundary. Random sampling will occur between the boundaries.
ReliefUse	Percentage of times Relief is used to select features.
ReliefNN	Min and max of number of nearest neighbors search range in Relief.
ReliefSampleSize	Min and max of sample size search range in Relief.
ReliefDistanceP	Min and max of positive distance search range in Relief.
ReliefNumFeatures	Min and max of number of features that is selected search range in Relief.

Defaults and Options:

Subkey	Default	Options
Variance	1.0	Float
GroupwiseSearch	True	Boolean(s)
SelectFromModel	0.0	Float
UsePCA	0.25	Float
PCAType	95variance, 10, 50, 100	Inteteger(s), 95variance
StatisticalTestUse	0.25	Float
StatisticalTestMetric	MannWhitneyU	ttest, Welch, Wilcoxon, MannWhitneyU
StatisticalTestThreshold	-3, 2.5	Two Integers: loc and scale
ReliefUse	0.25	Float
ReliefNN	2, 4	Two Integers: loc and scale
ReliefSampleSize	1, 1	Two Integers: loc and scale
ReliefDistanceP	1, 3	Two Integers: loc and scale
ReliefNumFeatures	25, 100	Two Integers: loc and scale

SelectFeatGroup¶

If the PREDICT feature computation and classification tools are used, then you can do a gridsearch among the various feature groups for the optimal combination. If you do not want this, set all fields to a single value.

Previously, there was a single parameter for the texture features, selecting all, none or a single group. This is still supported, but not recommended, and looks as follows:

Description:

Subkey	Description
shape_features	If True, use shape features in model.
histogram_features	If True, use histogram features in model.
orientation_features	If True, use orientation features in model.
texture_Gabor_features	If True, use Gabor texture features in model.
texture_GLCM_features	If True, use GLCM texture features in model.
texture_GLDM_features	If True, use GLDM texture features in model.
texture_GLCMMS_features	If True, use GLCM Multislice texture features in model.
texture_GLRLM_features	If True, use GLRLM texture features in model.
texture_GLSZM_features	If True, use GLSZM texture features in model.
texture_GLDZM_features	If True, use GLDZM texture features in model.
texture_NGTDM_features	If True, use NGTDM texture features in model.
texture_NGLDM_features	If True, use NGLDM texture features in model.
texture_LBP_features	If True, use LBP texture features in model.
patient_features	If True, use patient features in model.
semantic_features	If True, use semantic features in model.
coliage_features	If True, use coliage features in model.
vessel_features	If True, use vessel features in model.
phase_features	If True, use phase features in model.
fractal_features	If True, use fractal features in model.
location_features	If True, use location features in model.
rgrd_features	If True, use rgrd features in model.
toolbox	List of names of toolboxes to be used, or All
original_features	If True, use original features in model.
wavelet_features	If True, use wavelet features in model.
log_features	If True, use log features in model.

Defaults and Options:

Subkey	Default	Options
shape_features	True, False	Boolean(s)
histogram_features	True, False	Boolean(s)
orientation_features	True, False	Boolean(s)
texture_Gabor_features	False	Boolean(s)
texture_GLCM_features	True, False	Boolean(s)
texture_GLDM_features	True, False	Boolean(s)
texture_GLCMMS_features	True, False	Boolean(s)
texture_GLRLM_features	True, False	Boolean(s)
texture_GLSZM_features	True, False	Boolean(s)
texture_GLDZM_features	True, False	Boolean(s)
texture_NGTDM_features	True, False	Boolean(s)
texture_NGLDM_features	True, False	Boolean(s)
texture_LBP_features	True, False	Boolean(s)
patient_features	False	Boolean(s)
semantic_features	False	Boolean(s)
coliage_features	False	Boolean(s)
vessel_features	True, False	Boolean(s)
phase_features	True, False	Boolean(s)
fractal_features	True, False	Boolean(s)
location_features	True, False	Boolean(s)
rgrd_features	True, False	Boolean(s)
toolbox	All, PREDICT, PyRadiomics	All, or name of toolbox (PREDICT, PyRadiomics)
original_features	True	Boolean(s)
wavelet_features	True, False	Boolean(s)
log_features	True, False	Boolean(s)

Imputation¶

When using the PREDICT toolbox for classification, these settings are used for feature imputation.Note that these settings are actually used in the hyperparameter optimization. Hence you can provide multiple values per field, of which random samples will be drawn of which finally the best setting in combination with the other hyperparameters is selected.

Description:

Subkey	Description
use	If True, use feature imputation methods to replace NaN values. If False, all NaN features will be set to zero.
strategy	Method to be used for imputation.
n_neighbors	When using k-Nearest Neighbors (kNN) for feature imputation, determines the number of neighbors used for imputation. Can be a single integer or a list.

Defaults and Options:

Subkey	Default	Options
use	True	Boolean(s)
strategy	mean, median, most_frequent, constant, knn	mean, median, most_frequent, constant, knn
n_neighbors	5, 5	Two Integers: loc and scale

Classification¶

When using the PREDICT toolbox for classification, you can specify the following settings. Almost all of these are used in CASH. Most of the classifiers are implemented using sklearn; hence descriptions of the hyperparameters can also be found there.

Description:

Subkey	Description
fastr	Use fastr for the optimization gridsearch (recommended on clusters, default) or if set to False , joblib (recommended for PCs but not on Windows).
fastr_plugin	Name of execution plugin to be used. Default use the same as the self.fastr_plugin for the WORC object.
classifiers	Select the estimator(s) to use. Most are implemented using sklearn. For abbreviations, see above.
max_iter	Maximum number of iterations to use in training an estimator. Only for specific estimators, see sklearn.
SVMKernel	When using a SVM, specify the kernel type.
SVMC	Range of the SVM slack parameter. We sample on a uniform log scale: the parameters specify the range of the exponent (a, a + b).
SVMdegree	Range of the SVM polynomial degree when using a polynomial kernel. We sample on a uniform scale: the parameters specify the range (a, a + b).
SVMcoef0	Range of SVM homogeneity parameter. We sample on a uniform scale: the parameters specify the range (a, a + b).
SVMgamma	Range of the SVM gamma parameter. We sample on a uniform log scale: the parameters specify the range of the exponent (a, a + b)
RFn_estimators	Range of number of trees in a RF. We sample on a uniform scale: the parameters specify the range (a, a + b).
RFmin_samples_split	Range of minimum number of samples required to split a branch in a RF. We sample on a uniform scale: the parameters specify the range (a, a + b).
RFmax_depth	Range of maximum depth of a RF. We sample on a uniform scale: the parameters specify the range (a, a + b).
LRpenalty	Penalty term used in LR.
LRC	Range of regularization strength in LR. We sample on a uniform scale: the parameters specify the range (a, a + b).
LDA_solver	Solver used in LDA.
LDA_shrinkage	Range of the LDA shrinkage parameter. We sample on a uniform log scale: the parameters specify the range of the exponent (a, a + b).
QDA_reg_param	Range of the QDA regularization parameter. We sample on a uniform log scale: the parameters specify the range of the exponent (a, a + b).
ElasticNet_alpha	Range of the ElasticNet penalty parameter. We sample on a uniform log scale: the parameters specify the range of the exponent (a, a + b).
ElasticNet_l1_ratio	Range of l1 ratio in LR. We sample on a uniform scale: the parameters specify the range (a, a + b).
SGD_alpha	Range of the SGD penalty parameter. We sample on a uniform log scale: the parameters specify the range of the exponent (a, a + b).
SGD_l1_ratio	Range of l1 ratio in SGD. We sample on a uniform scale: the parameters specify the range (a, a + b).
SGD_loss	hinge, Loss function of SG
SGD_penalty	Penalty term in SGD.
CNB_alpha	Regularization strenght in ComplementNB. We sample on a uniform scale: the parameters specify the range (a, a + b)

Defaults and Options:

Subkey	Default	Options
fastr	True	True, False
fastr_plugin	LinearExecution	Any fastr execution plugin .
classifiers	SVM, SVM, SVM, RF, LR, LDA, QDA, GaussianNB	SVM , SVR, SGD, SGDR, RF, LDA, QDA, ComplementND, GaussianNB, LR, RFR, Lasso, ElasticNet. All are estimators from sklearn
max_iter	100000	Integer
SVMKernel	poly, rbf, linear	poly, linear, rbf
SVMC	0, 6	Two Integers: loc and scale
SVMdegree	1, 6	Two Integers: loc and scale
SVMcoef0	0, 1	Two Integers: loc and scale
SVMgamma	-5, 5	Two Integers: loc and scale
RFn_estimators	10, 90	Two Integers: loc and scale
RFmin_samples_split	2, 3	Two Integers: loc and scale
RFmax_depth	5, 5	Two Integers: loc and scale
LRpenalty	l2, l1	none, l2, l1
LRC	0.01, 1.0	Two Integers: loc and scale
LDA_solver	svd, lsqr, eigen	svd, lsqr, eigen
LDA_shrinkage	-5, 5	Two Integers: loc and scale
QDA_reg_param	-5, 5	Two Integers: loc and scale
ElasticNet_alpha	-5, 5	Two Integers: loc and scale
ElasticNet_l1_ratio	0, 1	Two Integers: loc and scale
SGD_alpha	-5, 5	Two Integers: loc and scale
SGD_l1_ratio	0, 1	Two Integers: loc and scale
SGD_loss	hinge, squared_hinge, modified_huber	hinge, squared_hinge, modified_huber
SGD_penalty	none, l2, l1	none, l2, l1
CNB_alpha	0, 1	Two Integers: loc and scale

CrossValidation¶

When using the PREDICT toolbox for classification and you specified using cross validation, specify the following settings.

Description:

Subkey	Description
N_iterations	Number of times the data is split in training and test in the outer cross-validation.
test_size	The percentage of data to be used for testing.
fixed_seed	If True, use a fixed seed for the cross-validation splits.

Defaults and Options:

Subkey	Default	Options
N_iterations	100	Integer
test_size	0.2	Float
fixed_seed	False	Boolean

Labels¶

When using the PREDICT toolbox for classification, you have to set the label used for classification.

This part is really important, as it should match your label file. Suppose your patientclass.txt file you supplied as source for labels looks like this:

Patient	Label1	Label2
patient1	1	0
patient2	2	1
patient3	1	5

You can supply a single label or multiple labels split by commas, for each of which an estimator will be fit. For example, suppose you simply want to use Label1 for classification, then set:

config['Labels']['label_names'] = 'Label1'

If you want to first train a classifier on Label1 and then Label2, set: config[Genetics][label_names] = Label1, Label2

Description:

Subkey	Description
label_names	The labels used from your label file for classification.
modus	Determine whether multilabel or singlelabel classification or regression will be performed.
url	WIP
projectID	WIP

Defaults and Options:

Subkey	Default	Options
label_names	Label1, Label2	String(s)
modus	singlelabel	singlelabel, multilabel
url	WIP	WIP
projectID	WIP	WIP

Hyperoptimization¶

When using the PREDICT toolbox for classification, you have to supply your hyperparameter optimization procedure here.

Description:

Subkey	Description
scoring_method	Specify the optimization metric for your hyperparameter search.
test_size	Size of test set in the hyperoptimization cross validation, given as a percentage of the whole dataset.
n_splits	Number of iterations in train-validation cross-validation used for model optimization.
N_iterations	Number of iterations used in the hyperparameter optimization. This corresponds to the number of samples drawn from the parameter grid.
n_jobspercore	Number of jobs assigned to a single core. Only used if fastr is set to true in the classfication.
maxlen	Number of estimators for which the fitted outcomes and parameters are saved. Increasing this number will increase the memory usage.
ranking_score	Score used for ranking the performance of the evaluated workflows.

Defaults and Options:

Subkey	Default	Options
scoring_method	f1_weighted	Any sklearn metric
test_size	0.15	Float
n_splits	5	Integer
N_iterations	10000	Integer
n_jobspercore	2000	Integer
maxlen	100	Integer
ranking_score	test_score	String

FeatureScaling¶

Determines which method is applied to scale each feature.

Description:

Subkey	Description
scale_features	Determine whether to use feature scaling is.
scaling_method	Determine the scaling method.

Defaults and Options:

Subkey	Default	Options
scale_features	True	Boolean(s)
scaling_method	z_score	z_score, minmax, robust

SampleProcessing¶

Before performing the hyperoptimization, you can use SMOTE: Synthetic Minority Over-sampling Technique to oversample your data.

Description:

Subkey	Description
SMOTE	Determine whether to use SMOTE oversampling, see also ` imbalanced learn <https://imbalanced-learn.readthedocs.io/en/stable/generated/imblearn.over_sampling.SMOTE.html/>`_.
SMOTE_ratio	Determine the ratio of oversampling. If 1, the minority class will be oversampled to the same size as the majority class. We sample on a uniform scale: the parameters specify the range (a, a + b).
SMOTE_neighbors	Number of neighbors used in SMOTE. This should be much smaller than the number of objects/patients you supply. We sample on a uniform scale: the parameters specify the range (a, a + b).
Oversampling	Determine whether to random oversampling.

Defaults and Options:

Subkey	Default	Options
SMOTE	True, False	Boolean(s)
SMOTE_ratio	1, 0	Two Integers: loc and scale
SMOTE_neighbors	5, 15	Two Integers: loc and scale
Oversampling	False	Boolean(s)

Ensemble¶

WORC supports ensembling of workflows. This is not a default approach in radiomics, hence the default is to not use it and select only the best performing workflow.

Description:

Subkey	Description
Use	Determine whether to use ensembling or not. Provide an integer to state how many estimators to include: 1 equals no ensembling.

Defaults and Options:

Subkey	Default	Options
Use	50	Integer

Evaluation¶

In the evaluation of the performance, several adjustments can be made.

Description:

Subkey	Description
OverfitScaler	Wheter to fit a separate scaler on the test set (=overfitting) or use scaler on training dataset. Only used for experimental purposes: never overfit your scaler for the actual performance evaluation.

Defaults and Options:

Subkey	Default	Options
OverfitScaler	False	True, False

Bootstrap¶

Besides cross validation, WORC supports bootstrapping on the test set for performance evaluation.

Description:

Subkey	Description
Use	Determine whether to use bootstrapping or not.
N_iterations	Number of iterations to use for bootstrapping.

Defaults and Options:

Subkey	Default	Options
Use	False	Boolean
N_iterations	100	Integer