plotting Package

plotting Package

compute_CI Module

WORC.plotting.compute_CI.compute_confidence(metric, N_train, N_test, alpha=0.95)[source]

Function to calculate the adjusted confidence interval for cross-validation. metric: numpy array containing the result for a metric for the different cross validations (e.g. If 20 cross-validations are performed it is a list of length 20 with the calculated accuracy for each cross validation) N_train: Integer, number of training samples N_test: Integer, number of test_samples alpha: float ranging from 0 to 1 to calculate the alpha*100% CI, default 0.95

WORC.plotting.compute_CI.compute_confidence_bootstrap(bootstrap_metric, test_metric, N_1, alpha=0.95)[source]

Function to calculate confidence interval for bootstrapped samples. metric: numpy array containing the result for a metric for the different bootstrap iterations test_metric: the value of the metric evaluated on the true, full test set alpha: float ranging from 0 to 1 to calculate the alpha*100% CI, default 0.95

WORC.plotting.compute_CI.compute_confidence_logit(metric, N_train, N_test, alpha=0.95)[source]

Function to calculate the adjusted confidence interval metric: numpy array containing the result for a metric for the different cross validations (e.g. If 20 cross-validations are performed it is a list of length 20 with the calculated accuracy for each cross validation) N_train: Integer, number of training samples N_test: Integer, number of test_samples alpha: float ranging from 0 to 1 to calculate the alpha*100% CI, default 95%

linstretch Module

WORC.plotting.linstretch.linstretch(i, i_max=255, i_min=0)[source]

Stretch the input image i pixel values from i_min to i_max

plot_ROC Module

WORC.plotting.plot_ROC.curve_thresholding(metric1t, metric2t, thresholds, nsamples=20)[source]

Construct metric1 and metric2 (either FPR and TPR, or TPR and Precision) ratios at different thresholds for the scores of an estimator.

WORC.plotting.plot_ROC.main()[source]
WORC.plotting.plot_ROC.plot_PRC_CIc(y_truth, y_score, N_1, N_2, plot='default', alpha=0.95, verbose=False, DEBUG=False, tsamples=20)[source]

Plot a Precision-Recall curve with confidence intervals.

tsamples: number of sample points on which to determine the confidence intervals.

The sample pointsare used on the thresholds for y_score.

WORC.plotting.plot_ROC.plot_ROC(prediction, pinfo, ensemble_method='top_N', ensemble_size=1, label_type=None, ROC_png=None, ROC_tex=None, ROC_csv=None, PRC_png=None, PRC_tex=None, PRC_csv=None)[source]
WORC.plotting.plot_ROC.plot_ROC_CIc(y_truth, y_score, N_1, N_2, plot='default', alpha=0.95, verbose=False, DEBUG=False, tsamples=20)[source]

Plot a Receiver Operator Characteristic (ROC) curve with confidence intervals.

tsamples: number of sample points on which to determine the confidence intervals.

The sample pointsare used on the thresholds for y_score.

WORC.plotting.plot_ROC.plot_single_PRC(y_truth, y_score, verbose=False, returnplot=False)[source]

Get the precision and recall (=true positive rate) for the ground truth and score of a single estimator. These ratios can be used to plot a Precision Recall Curve (ROC).

WORC.plotting.plot_ROC.plot_single_ROC(y_truth, y_score, verbose=False, returnplot=False)[source]

Get the False Positive Ratio (FPR) and True Positive Ratio (TPR) for the ground truth and score of a single estimator. These ratios can be used to plot a Receiver Operator Characteristic (ROC) curve.

plot_barchart Module

WORC.plotting.plot_barchart.count_parameters(parameters)[source]
WORC.plotting.plot_barchart.main()[source]
WORC.plotting.plot_barchart.paracheck(parameters)[source]
WORC.plotting.plot_barchart.plot_barchart(prediction, estimators=10, label_type=None, output_tex=None, output_png=None)[source]

Make a barchart of the top X hyperparameters settings of the ranked estimators in all cross validation iterations.

Parameters

prediction: filepath, mandatory

Path pointing to the .hdf5 file which was is the output of the trainclassifier function.

estimators: integer, default 10

Number of hyperparameter settings/estimators used in each cross validation. The settings are ranked, so when supplying e.g. 10, the best 10 settings in each cross validation setting will be used.

label_type: string, default None

The name of the label predicted by the estimator. If None, the first label from the prediction file will be used.

output_tex: filepath, optional

If given, the barchart will be written to this tex file.

output_png: filepath, optional

If given, the barchart will be written to this png file.

Returns

fig: matplotlib figure

The figure in which the barchart is plotted.

WORC.plotting.plot_barchart.plot_bars(params, normalization_factor=None, figwidth=40, fontsize=30, spacing=2)[source]

plot_boxplot_features Module

WORC.plotting.plot_boxplot_features.generate_feature_boxplots(image_features, label_data, output_zip, dpi=500, verbose=False)[source]

Generate boxplots of the feature values among different objects.

Parameters

features: list, mandatory

List with a dictionary of the feature labels and values for each patient.

label_data: pandas dataframe, mandatory

Dataframe containing the labels of the objects.

outputfolder: path, mandatory

Folder to which the output boxplots should be written.

WORC.plotting.plot_boxplot_features.plot_boxplot_features(features, label_data, config, output_zip, label_type=None, verbose=False)[source]

plot_boxplot_performance Module

WORC.plotting.plot_boxplot_performance.generate_performance_boxplots(performances, metrics, outputfolder, colors=None)[source]

Generate boxplots for performance of various models.

WORC.plotting.plot_boxplot_performance.test()[source]

Test functionality with synthetic data.

plot_errors Module

WORC.plotting.plot_errors.plot_errors(featurefiles, patientinfo, label_type, featurenames, posteriors_csv=None, agesex=True, output_png=None, output_tex=None)[source]

Scatterplot of all objects with marking of errors.

plot_estimator_performance Module

WORC.plotting.plot_estimator_performance.combine_multiple_estimators(predictions, label_data, N_1, N_2, multilabel_type, label_types, ensemble=1, strategy='argmax', alpha=0.95)[source]

Combine multiple estimators in a single model.

Note: the multilabel_type labels should correspond to the ordering in label_types. Hence, if multilabel_type = 0, the prediction is label_type[0] etc.

WORC.plotting.plot_estimator_performance.compute_statistics(y_truth, y_score, y_prediction, modus, regression)[source]

Compute statistics on predictions.

WORC.plotting.plot_estimator_performance.fit_thresholds(thresholds, estimator, label_type, X_train, Y_train, ensemble_method, ensemble_size, ensemble_scoring)[source]
WORC.plotting.plot_estimator_performance.main()[source]
WORC.plotting.plot_estimator_performance.plot_estimator_performance(prediction, label_data, label_type, crossval_type=None, alpha=0.95, ensemble_method='top_N', ensemble_size=100, verbose=True, ensemble_scoring=None, output=None, modus=None, thresholds=None, survival=False, shuffle_estimators=False, bootstrap=None, bootstrap_N=None, overfit_scaler=None, save_memory=True, refit_ensemble=False)[source]

Plot the output of a single estimator, e.g. a SVM.

Parameters

prediction: pandas dataframe or string, mandatory

output of trainclassifier function, either a pandas dataframe or a HDF5 file

label_data: string, mandatory

Contains the path referring to a .txt file containing the patient label(s) and value(s) to be used for learning. See the Github Wiki for the format.

label_type: string, mandatory

Name of the label to extract from the label data to test the estimator on.

alpha: float, default 0.95

Significance of confidence intervals.

ensemble_method: string, default ‘top_N’

Determine which method to use for creating the ensemble. Choices: top_N or Caruana

ensemble_size: int, default 50

Determine the size of the ensemble. Only relevant for top_N

verbose: boolean, default True

Plot intermedate messages.

ensemble_scoring: string, default None

Metric to be used for evaluating the ensemble. If None, the option set in the prediction object will be used.

output: string, default stats

Determine which results are put out. If stats, the statistics of the estimator will be returned. If scores, the scores will be returned.

thresholds: list of integer(s), default None

If None, use default threshold of sklearn (0.5) on posteriors to converge to a binary prediction. If one integer is provided, use that one. If two integers are provided, posterior < thresh[0] = 0, posterior > thresh[1] = 1.

Returns

Depending on the output parameters, the following outputs are returned:

If output == ‘stats’: stats: dictionary

Contains the confidence intervals of the performance metrics and the number of times each patient was classifier correctly or incorrectly.

If output == ‘scores’: y_truths: list

Contains the true label for each object.

y_scores: list

Contains the score (e.g. posterior) for each object.

y_predictions: list

Contains the predicted label for each object.

pids: list

Contains the patient ID/name for each object.

plot_hyperparameters Module

WORC.plotting.plot_hyperparameters.plot_hyperparameters(prediction, label_type=None, estsize=50, output=None, removeconstants=False, verbose=False)[source]

Gather which hyperparameters have been used in the best workflows.

Parameters

prediction: pandas dataframe or string, mandatory

output of trainclassifier function, either a pandas dataframe or a HDF5 file

estsize: integer, default 50

Number of estimators that should be taken into account.

output: filename of csv, default None

Output file to write to. If None, not output is written, but just returned as a variable.

removeconstants: boolean, default False

Determine whether to remove any hyperparameters which have the same value in all workflows.

verbose: boolean, default False

Whether to show print messages or not.

plot_images Module

WORC.plotting.plot_images.bbox_2D(img, mask, padding=[1, 1], img2=None)[source]
WORC.plotting.plot_images.extract_boundary(contour, radius=2)[source]
WORC.plotting.plot_images.plot_im_and_overlay(image, mask=None, figsize=(3, 3), alpha=0.4, color='cyan', colormap='gray', colorbar=False)[source]

Plot an image in a matplotlib figure and overlay with a mask.

WORC.plotting.plot_images.slicer(image, mask=None, output_name=None, output_name_zoom=None, thresholds=[-5, 5], zoomfactor=4, dpi=500, normalize=True, expand=False, boundary=False, square=False, flip=True, rot90=0, alpha=0.4, axis='axial', index=None, color='cyan', radius=2, colormap='gray', fill=False)[source]

Plot slice of image where mask is largest, with mask as overlay.

image and mask should both be arrays

plot_pvalues_features Module

WORC.plotting.plot_pvalues_features.manhattan_importance(values, labels, feature_labels, output_png=None, output_tex=None, mapping=None, threshold_annotated=0.05)[source]

plot_ranked_scores Module

WORC.plotting.plot_ranked_scores.flatten_object(input)[source]

Flatten various objects to a 1D list.

WORC.plotting.plot_ranked_scores.main()[source]
WORC.plotting.plot_ranked_scores.plot_ranked_images(pinfo, label_type, images, segmentations, ranked_truths, ranked_scores, ranked_PIDs, output_zip=None, output_itk=None, zoomfactor=4, scores='percentages')[source]
WORC.plotting.plot_ranked_scores.plot_ranked_percentages(estimator, pinfo, label_type=None, ensemble_method='top_N', ensemble_size=100, output_csv=None)[source]
WORC.plotting.plot_ranked_scores.plot_ranked_posteriors(estimator, pinfo, label_type=None, ensemble_method='top_N', ensemble_size=100, output_csv=None)[source]
WORC.plotting.plot_ranked_scores.plot_ranked_scores(estimator, pinfo, label_type, scores='percentages', images=[], segmentations=[], ensemble_method='top_N', ensemble_size=100, output_csv=None, output_zip=None, output_itk=None)[source]

Rank the patients according to their average score. The score can either be the average posterior or the percentage of times the patient was classified correctly in the cross validations. Additionally, the middle slice of each patient is plot and saved according to the ranking.

Parameters

estimator: filepath, mandatory

Path pointing to the .hdf5 file which was is the output of the trainclassifier function.

pinfo: filepath, mandatory

Path pointint to the .txt file which contains the patient label information.

label_type: string, default None

The name of the label predicted by the estimator. If None, the first label from the prediction file will be used.

scores: string, default percentages

Type of scoring to be used. Either ‘posteriors’ or ‘percentages’.

images: list, optional

List containing the filepaths to the ITKImage image files of the patients.

segmentations: list, optional

List containing the filepaths to the ITKImage segmentation files of the patients.

ensemble_method: string, optional

Method to be used for ensembling.

ensemble_size: int, optional

If top_N method is used, number of workflows to be included in ensemble.

output_csv: filepath, optional

If given, the scores will be written to this csv file.

output_zip: filepath, optional

If given, the images will be plotted and the pngs saved to this zip file.

output_itk: filepath, optional

WIP

plotminmaxresponse Module

WORC.plotting.plotminmaxresponse.main()[source]