Radiomics Features¶
WORC is not a feature extraction toolbox, but a workflow management and foremost workflow optimization method / toolbox. However, feature extraction is generally part of the workflow. Users can add their own feature toolbox, but the default used feature toolboxes are PREDICT and PyRadiomics . The options for feature extraction using these toolboxes within WORC and their defaults are described in this chapter, organized per feature group.
Here, we provide an overview of all features and an explantion of what they quantify. For a comprehensive overview of all functions and parameters, please look at the config chapter.
For all features, the feature labels reflect the descriptions named here. When parameters have to be set, the values of these parameters are included in the feature label.
For all the features, you can determine whether PREDICT or PyRadiomics exctract these by changing the
related parameters in config['PyRadiomics']
and config['ImageFeatures']
for PREDICT.
Furthermore, we refer the user to the following literature:
More information on PyRadiomics: Van Griethuysen, Joost JM, et al. “Computational radiomics system to decode the radiographic phenotype.” Cancer research 77.21 (2017): e104-e107.
More detailed description of many of the used features: Parekh, Vishwa, and Michael A. Jacobs. “Radiomics: a new application from established techniques.” Expert review of precision medicine and drug development 1.2 (2016): 207-226.
Overview of often used radiomics features: Zwanenburg, Alex, et al. “The image biomarker standardization initiative: standardized quantitative radiomics for high-throughput image-based phenotyping.” Radiology 295.2 (2020): 328-338.
In total, the defaults of WORC result in the following amount of features:
Type |
Number |
---|---|
13 |
|
35 |
|
9 |
|
144 |
|
16 |
|
16 |
|
5 |
|
14 |
|
156 |
|
39 |
|
39 |
|
39 |
|
39 |
|
Total |
564 |
Note
The settings for the parameters are included in the feature label. For example, tf_GLCM_contrastd1.0A1.57 is the contrast of the GLCM computed at a distance of 1 pixel and and angle of 1.57 radians ~ 90 degrees.
Histogram features¶
Histogram features are based on the image intensities themselves. Usually, a histogram of the intensities is made, after which several first order statistics are extracted. Therefore, these features are commonly also referred to as first order or intensity features.
Both PREDICT and PyRadiomics include similar first order features. We have therefore chosen to only use PREDICT by default to avoid redundant features. PREDICT extracts the following features using a histogram with 50 bins:
Minimum (defined as the 2nd percentile for robustness)
Maximum (defined as the 98nd percentile for robustness)
Range
Interquartile range
Standard deviation
Skewness
Kurtosis
Peak value
Peak position
Energy
Entropy
Mean
Median
Note
The minimum, maximum, range and interquartile range are extracted from the raw data, as histogram creation may may result in a loss of needed information.
Shape features¶
Shape features describe morphological properties of the region of interest and are therefore solely based on the segmentation, not the image. As PREDICT and PyRadiomics offer complementary shape descriptors, both packages are used by default.
In PREDICT, these descriptors are by default extracted per 2-D slice and aggregated over all slices, as in our experience the slice thickness is often too large to create sensible 3-D shape descriptors. For each aggregated descriptor, PREDICT extracts the mean and standard deviation. Most of the shape features are based on the following papers:
The mean and standard deviation of following shape features are extracted:
Compactness
Radial distance
Roughness
Convexity
Circular variance
Principal axis ratio (PRAX)
Elliptical variance
Solidity
Area
Additional, the min and max area and, if pixel spacing is included in the image or metadata, the volume is computed for a total of 21 shape features.
In PyRadiomics, the following shape features according to the defaults are extracted:
Elongation
Flatness
Least Axis Length
Major Axis Length
Maximum 2D diameter for columns
Maximum 2D diameter for rows
Maximum 2D diameter for slices
Maximum 3D diameter
Mesh Volume
Minor Axis Length
Sphericity
Surface Area
Surface Volume Ratio
Voxel Volume
Hence, the total number of shape features is 35.
Orientation features¶
Orientation features describe the orientation and location of the ROI. While these on itself may not be relevant for the prediction, these may serve as moderation features for orientation dependent features. As PREDICT and PyRadiomics again provide complementary features, by default WORC uses both toolboxes for orientation feature extraction
The following orientation features are extracted from PREDICT:
X-angle
Y-angle
Z-angle
The angles are extracted by fitting a 3D ellips to the ROI and using the orientations fo the three major axes.
The following orientation features are extracted from PyRadiomics using the Center Of Mass (COM):
COM index x
COM index y
COM index z
COM x
COM y
COM z
Texture features¶
The last group is the largest and basically contains all features not within the other groups, as a feature quantifying a form of texture is a broad definition. Within the texture features, there are several sub-groups. If groupwise feature selection is used, each of these subgroups has an on/off hyperparameter.
Note that we have decided to split several groups from the texture features. Within the texture features, we have included more commonly used texture features, as these are indeed commonly grouped under texture features. The less well-known features are described later on in this chapter.
Gray-Level Co-occurence Matrix (GLCM)¶
The GLCM and other gray-level based matrix features are based on a discretized version of the image, i.e.
the gray-level matrix. The config['ImageFeatures']['GLCM_levels']
parameter determines the number of
levels for the discretization. As default, WORC uses 16 levels, as this works in smaller ROIs containing
fewer regions but does not throw away to much information in larger regions.
The GLCM counts the co-occurences of neighbouring pixels of each gray level value using two parameters: the distance between pixels, and the angle in which co-occurences are counted. As generally beforehand it is not known which of these settings may lead to relevant features, the GLCM at multiple values is extracted:
config['ImageFeatures']['GLCM_angles'] = '0, 0.79, 1.57, 2.36'
config['ImageFeatures']['GLCM_distances'] = '1, 3'
Boht PREDICT and PyRadiomics can extract GCLM features. Again, we would like to extract the GLCM per 2D slice, similar to the shape fetures, As a default, we use therefore PREDICT, as PREDICT provides two ways to do so: compute the GLCM and it’s features per slice and aggregate, or aggregate the GLCM’s of all slices and once compute features, which PREDICT calls GLCM Multi Slice (GLCMMS) features. re PREDICT extracts both for the GLCM and GLCMMS for all combinations of angles and distances the following features:
Contrast
Dissimilarity
Homogeneity
Angular Second Momentum (ASM)
Energy
Correlation
In total, computing these six features for both the GCLM and GLCMMS for all combinations of angles and degrees results in a total of 144 features.
Gray-Level Run Length Matrix (GLRLM)¶
The GRLM counts how many lines of a certain gray level and length occur, in a specific direction. The only parameter of the GRLM is thus the direction, for which we use the PyRadiomics default. The GRLM is in PREDICT extracted using PyRadiomics, so WORC relies on directly using PyRadiomics.
The following GRLM features are by default extracted:
Gray level non-uniformity
Gray level non-uniformity normalized
Gray level variance
High gray level run emphasis
Long run emphasis
Long run high gray level emphasis
Long run low gray level emphasis
Low gray level run emphasis
Run entropy
Run length non-uniformity
Run length non-uniformity normalized
Run percentage
Run variance
Short run emphasis
Short run high gray level emphasis
Short run low gray level emphasis
Gray-Level Size Zone Matrix (GLSZM)¶
The GLSZM counts how many areas of a certain gray level and size occur. It therefore has no parameters. The GLSZM is in PREDICT extracted using PyRadiomics, so WORC relies on directly using PyRadiomics.
The following GLSZM features are by default extracted:
Gray level non-uniformity
Gray level non-uniformity normalized
Gray level variance
High gray level zone emphasis
Large area emphasis
Large area high gray level emphasis
Large area low gray level emphasis
Low gray level zone emphasis
zone entropy
Size zone non-uniformity
Size zone non-uniformity normalized
Zone percentage
Zone variance
Small area emphasis
Small area high gray level emphasis
Small area low gray level emphasis
Gray Level Dependence Matrix (GLDM)¶
The GLDM determines how much voxels in a neighborhood depend (e.g. are similar) to the centre voxel. Parameters include the distance to define the neighborhood and the similarity threshold. The GLDM is also extracted using PyRadiomics, and it’s default therefore used.
The following GLDM features are used:
Dependence Entropy
Dependence Non-Uniformity
Dependence Non-Uniformity Normalized
Dependence Variance
Gray Level Non-Uniformity
Gray Level Variance
High Gray Level Emphasis
Large Dependence Emphasis
Large Dependence High Gray Level Emphasis
Large Dependence Low Gray Level Emphasis
Low Gray Level Emphasis
Small Dependence Emphasis
Small Dependence High Gray Level Emphasis
Small Dependence Low Gray Level Emphasis
Neighborhood Gray Tone Difference Matrix (NGTDM)¶
The NGTDM looks at the difference between a pixel’s gray value and that of it’s neighborhood within a distance, which is the only parameter. The NGTDM is also extracted using PyRadiomics, and it’s default therefore used.
The following NGTDM features are extracted:
Busyness
Coarseness
Complexity
Contrast
Strength
Gabor filter features¶
These features are extracted through PREDICT by first applying a set of Gabor filters to the image with the following parameters:
config['ImageFeatures']['gabor_frequencies'] = '0.05, 0.2, 0.5'
config['ImageFeatures']['gabor_angles'] = '0, 45, 90, 135'
The angles are equal to the GLCM angles, but are given in degrees. For each unique combination of angle and frequency, the image is filtered per 2-D axial slice, after which the PREDICT histogram features as discussed earlier are extracted from the filtered images.
Laplacian of Gaussian (LoG) filter features¶
Similar to the Gabor features, these features are extracted after the filtering the image, now with a LoG filter. WORC includes the width of the Gaussian part of the filter as parameter:
config['ImageFeatures']['log_sigma'] = '1, 5, 10'
Again, for all sigma’s, the images are filtered per 2-D slice after which the PREDICT histogram features as discussed earlier are extracted from the filtered images.
Vessel filter features¶
Similar to the Gabor features, these features are extracted after the filtering the image, now using a so called vessel filter from the following paper:
As the filter triggers on tubular structeres, these filter may be used to not only detect vessels but any tube like structure. The following parameters are used, see also the paper:
config['ImageFeatures']['vessel_scale_range'] = '1, 10'
config['ImageFeatures']['vessel_scale_step'] = '2'
config['ImageFeatures']['vessel_radius'] = '5'
As in several applications we were interested in vessel structures in the core of the ROI, WORC splits the ROI in an inner and outer part using the vessel_radius parameter.
Again, for all parameter combinations, the images are filtered per 2-D slice and the PREDICT histogram features as discussed earlier are extracted from the filtered images. This is done for the full ROI, the inner region, and the outer region.
Local Binary Patterns (LBP)¶
We recommend the following article for information about LBPs:
Again, a range of parameters is used to compute the LBP:
config['ImageFeatures']['LBP_radius'] = '3, 8, 15'
config['ImageFeatures']['LBP_npoints'] = '12, 24, 36'
For all parameter combinations, as each npoints corresponds to a radius setting, the images are “filtered” (the LBP produces an image with the same dimensions as the original, similar to a filtering operation) per 2-D slice and the PREDICT histogram features as discussed earlier are extracted from the filtered images, both for the inner and outer region.
Local phase features¶
In many imaging modalities, e.g. MRI, the intensity scale varies a lot per image. Therefore, using intensity information may not be relevant: changes in contrast in local regions may be more relevant. Therefore, PREDICT includes features based on local phase, which transforms the image to an intensity invariant phase by looking at fluctuations or the phase of the intensity in a local region. On these local phase images, measures based on congruency or symmetry of phase may result in relevant features. For more information, please see the work of Peter Kovesi.
Local phase computations serves as a filter, with the following parameters:
config['ImageFeatures']['phase_minwavelength'] = '3'
config['ImageFeatures']['phase_nscale'] = '5'
Again, for all parameter combinations, the images are filtered per 2-D slice and the PREDICT histogram features as discussed earlier are extracted from the filtered images. This is done for the local phase, phase congruency, and phase symmetry.
DICOM features¶
In PREDICT, several features may be extracted from DICOM headers, which can be provided in the metadata source. By default, these include:
(0010, 1010)
: Patient age(0010, 0040)
: Patient sex
You can define which tags you want to extract and how to name these features by altering the following in the config:
config['ImageFeatures']['dicom_feature_tags'] = '0010 1010, 0010 0040'
config['ImageFeatures']['dicom_feature_labels'] = 'age, sex'
Note that the value will be converted to a float. If that’s not possible, or
the tag is not present, numpy.NaN
will be used instead.
Other features may you want to include:
(0008, 0070)
: Scanner manufacturer(0018, 0022)
: Scan options, see below(0018, 0050)
: Slice thickness(0018, 0080)
: Repetition time (MRI)(0018, 0081)
: Echo time (MRI)(0018, 0087)
: Magnetic field strength (MRI)(0018, 1314)
: Flip angle (MRI)(0028, 0030)
: Pixel spacing
Several routines for converting values to floats has been defined for the following features:
(0008, 0070)
(Scanner manufacturer): 0 = Siemens, 1 = Philips, 2 = General Electric, 3 = Toshiba. If not one of these,numpy.NaN
is used.(0018, 0022)
(Scan options): if name is ‘FatSat’, determine whether a a scan has been made with fat saturation or not from the scan options.(0010, 0040)
(Patient Sex): M = 0, F = 1(0018, 0087)
(Magnetic field strength): 5000 = 0.5, 10000 = 1.0, 15000 = 1.5, 30000 = 3.0. If not convertible to float, usenumpy.NaN
(0028, 0030)
(Pixel spacing): Use first value and convert to float
Semantic features¶
WORC allows the user to provide non-computational features, which are called semantic features. These can be give to WORC as an Excel file, in which each column represents a feature. See the User manual chapter for more details on providing these features
Other extraction choices¶
Filtering on ROI or full image.¶
For all filter based features, the images are first filtered using the full image, after which the features are extracted from the region of interests (ROI). Only filtering the ROI with the filters would result in edge artefacts. A drawback could be that now the ROI surroundings influence the feature, but this can also be a benefit as a comparison between the ROI and it’s surrounding could give relevant information.
Feature extraction parameter selection¶
Many of the extracted features have parameters to be set. For each application, the most suitable set of parameters may vary. Therefore, in WORC, by default many features are extracted at a range of parameters. We hypothesize that in the next steps, e.g. feature selection and classification, the most relevant features will be automatically used.
Wavelet features¶
PyRadiomics supports the extraction of so-called wavelet features by first applying a set of filters to the image before extracting the above mentioned features. The amount of features therefore quickly expands when using wavelet features, while we have not noticed improvements in our experiments. Hence, to save computation time, we have decided to only include original features in WORC. Usage of wavelet features is however supported, both in feature extraction and selection, see the Config chapter.
Fixed bin width vs fixed bin size¶
For all gray level matrix based features, WORC by default uses a fixed bin-width, while PyRadiomics argues to use a fixed bin-size The reason for that is that we want the WORC default settings to work in a wide variety of applications, including those with images in arbitrary scales, which often happens when using MRI. In these cases, using a fixed bin-width may lead to odd features values and even errors.