plotting Package¶
plotting
Package¶
compute_CI
Module¶
- WORC.plotting.compute_CI.compute_confidence(metric, N_train, N_test, alpha=0.95)[source]¶
Function to calculate the adjusted confidence interval for cross-validation. metric: numpy array containing the result for a metric for the different cross validations (e.g. If 20 cross-validations are performed it is a list of length 20 with the calculated accuracy for each cross validation) N_train: Integer, number of training samples N_test: Integer, number of test_samples alpha: float ranging from 0 to 1 to calculate the alpha*100% CI, default 0.95
- WORC.plotting.compute_CI.compute_confidence_bootstrap(bootstrap_metric, test_metric, N_1, alpha=0.95)[source]¶
Function to calculate confidence interval for bootstrapped samples. metric: numpy array containing the result for a metric for the different bootstrap iterations test_metric: the value of the metric evaluated on the true, full test set alpha: float ranging from 0 to 1 to calculate the alpha*100% CI, default 0.95
- WORC.plotting.compute_CI.compute_confidence_logit(metric, N_train, N_test, alpha=0.95)[source]¶
Function to calculate the adjusted confidence interval metric: numpy array containing the result for a metric for the different cross validations (e.g. If 20 cross-validations are performed it is a list of length 20 with the calculated accuracy for each cross validation) N_train: Integer, number of training samples N_test: Integer, number of test_samples alpha: float ranging from 0 to 1 to calculate the alpha*100% CI, default 95%
linstretch
Module¶
plot_ROC
Module¶
- WORC.plotting.plot_ROC.curve_thresholding(metric1t, metric2t, thresholds, nsamples=20)[source]¶
Construct metric1 and metric2 (either FPR and TPR, or TPR and Precision) ratios at different thresholds for the scores of an estimator.
- WORC.plotting.plot_ROC.plot_PRC_CIc(y_truth, y_score, N_1, N_2, plot='default', alpha=0.95, verbose=False, DEBUG=False, tsamples=20)[source]¶
Plot a Precision-Recall curve with confidence intervals.
- tsamples: number of sample points on which to determine the confidence intervals.
The sample pointsare used on the thresholds for y_score.
- WORC.plotting.plot_ROC.plot_ROC(prediction, pinfo, ensemble_method='top_N', ensemble_size=1, label_type=None, ROC_png=None, ROC_tex=None, ROC_csv=None, PRC_png=None, PRC_tex=None, PRC_csv=None)[source]¶
- WORC.plotting.plot_ROC.plot_ROC_CIc(y_truth, y_score, N_1, N_2, plot='default', alpha=0.95, verbose=False, DEBUG=False, tsamples=20)[source]¶
Plot a Receiver Operator Characteristic (ROC) curve with confidence intervals.
- tsamples: number of sample points on which to determine the confidence intervals.
The sample pointsare used on the thresholds for y_score.
plot_barchart
Module¶
- WORC.plotting.plot_barchart.plot_barchart(prediction, estimators=10, label_type=None, output_tex=None, output_png=None)[source]¶
Make a barchart of the top X hyperparameters settings of the ranked estimators in all cross validation iterations.
Parameters¶
- prediction: filepath, mandatory
Path pointing to the .hdf5 file which was is the output of the trainclassifier function.
- estimators: integer, default 10
Number of hyperparameter settings/estimators used in each cross validation. The settings are ranked, so when supplying e.g. 10, the best 10 settings in each cross validation setting will be used.
- label_type: string, default None
The name of the label predicted by the estimator. If None, the first label from the prediction file will be used.
- output_tex: filepath, optional
If given, the barchart will be written to this tex file.
- output_png: filepath, optional
If given, the barchart will be written to this png file.
Returns¶
- fig: matplotlib figure
The figure in which the barchart is plotted.
plot_boxplot_features
Module¶
- WORC.plotting.plot_boxplot_features.generate_feature_boxplots(image_features, label_data, output_zip, dpi=500, verbose=False)[source]¶
Generate boxplots of the feature values among different objects.
Parameters¶
- features: list, mandatory
List with a dictionary of the feature labels and values for each patient.
- label_data: pandas dataframe, mandatory
Dataframe containing the labels of the objects.
- outputfolder: path, mandatory
Folder to which the output boxplots should be written.
plot_boxplot_performance
Module¶
plot_errors
Module¶
plot_estimator_performance
Module¶
- WORC.plotting.plot_estimator_performance.combine_multiple_estimators(predictions, label_data, N_1, N_2, multilabel_type, label_types, ensemble=1, strategy='argmax', alpha=0.95)[source]¶
Combine multiple estimators in a single model.
Note: the multilabel_type labels should correspond to the ordering in label_types. Hence, if multilabel_type = 0, the prediction is label_type[0] etc.
- WORC.plotting.plot_estimator_performance.compute_statistics(y_truth, y_score, y_prediction, modus, regression)[source]¶
Compute statistics on predictions.
- WORC.plotting.plot_estimator_performance.fit_thresholds(thresholds, estimator, label_type, X_train, Y_train, ensemble_method, ensemble_size, ensemble_scoring)[source]¶
- WORC.plotting.plot_estimator_performance.plot_estimator_performance(prediction, label_data, label_type, crossval_type=None, alpha=0.95, ensemble_method='top_N', ensemble_size=100, verbose=True, ensemble_scoring=None, output=None, modus=None, thresholds=None, survival=False, shuffle_estimators=False, bootstrap=None, bootstrap_N=None, overfit_scaler=None, save_memory=True, refit_ensemble=False)[source]¶
Plot the output of a single estimator, e.g. a SVM.
Parameters¶
- prediction: pandas dataframe or string, mandatory
output of trainclassifier function, either a pandas dataframe or a HDF5 file
- label_data: string, mandatory
Contains the path referring to a .txt file containing the patient label(s) and value(s) to be used for learning. See the Github Wiki for the format.
- label_type: string, mandatory
Name of the label to extract from the label data to test the estimator on.
- alpha: float, default 0.95
Significance of confidence intervals.
- ensemble_method: string, default ‘top_N’
Determine which method to use for creating the ensemble. Choices: top_N or Caruana
- ensemble_size: int, default 50
Determine the size of the ensemble. Only relevant for top_N
- verbose: boolean, default True
Plot intermedate messages.
- ensemble_scoring: string, default None
Metric to be used for evaluating the ensemble. If None, the option set in the prediction object will be used.
- output: string, default stats
Determine which results are put out. If stats, the statistics of the estimator will be returned. If scores, the scores will be returned.
- thresholds: list of integer(s), default None
If None, use default threshold of sklearn (0.5) on posteriors to converge to a binary prediction. If one integer is provided, use that one. If two integers are provided, posterior < thresh[0] = 0, posterior > thresh[1] = 1.
Returns¶
Depending on the output parameters, the following outputs are returned:
If output == ‘stats’: stats: dictionary
Contains the confidence intervals of the performance metrics and the number of times each patient was classifier correctly or incorrectly.
If output == ‘scores’: y_truths: list
Contains the true label for each object.
- y_scores: list
Contains the score (e.g. posterior) for each object.
- y_predictions: list
Contains the predicted label for each object.
- pids: list
Contains the patient ID/name for each object.
plot_hyperparameters
Module¶
- WORC.plotting.plot_hyperparameters.plot_hyperparameters(prediction, label_type=None, estsize=50, output=None, removeconstants=False, verbose=False)[source]¶
Gather which hyperparameters have been used in the best workflows.
Parameters¶
- prediction: pandas dataframe or string, mandatory
output of trainclassifier function, either a pandas dataframe or a HDF5 file
- estsize: integer, default 50
Number of estimators that should be taken into account.
- output: filename of csv, default None
Output file to write to. If None, not output is written, but just returned as a variable.
- removeconstants: boolean, default False
Determine whether to remove any hyperparameters which have the same value in all workflows.
- verbose: boolean, default False
Whether to show print messages or not.
plot_images
Module¶
- WORC.plotting.plot_images.plot_im_and_overlay(image, mask=None, figsize=(3, 3), alpha=0.4, color='cyan', colormap='gray', colorbar=False)[source]¶
Plot an image in a matplotlib figure and overlay with a mask.
- WORC.plotting.plot_images.slicer(image, mask=None, output_name=None, output_name_zoom=None, thresholds=[-5, 5], zoomfactor=4, dpi=500, normalize=True, expand=False, boundary=False, square=False, flip=True, rot90=0, alpha=0.4, axis='axial', index=None, color='cyan', radius=2, colormap='gray', fill=False)[source]¶
Plot slice of image where mask is largest, with mask as overlay.
image and mask should both be arrays
plot_pvalues_features
Module¶
plot_ranked_scores
Module¶
- WORC.plotting.plot_ranked_scores.flatten_object(input)[source]¶
Flatten various objects to a 1D list.
- WORC.plotting.plot_ranked_scores.plot_ranked_images(pinfo, label_type, images, segmentations, ranked_truths, ranked_scores, ranked_PIDs, output_zip=None, output_itk=None, zoomfactor=4, scores='percentages')[source]¶
- WORC.plotting.plot_ranked_scores.plot_ranked_percentages(estimator, pinfo, label_type=None, ensemble_method='top_N', ensemble_size=100, output_csv=None)[source]¶
- WORC.plotting.plot_ranked_scores.plot_ranked_posteriors(estimator, pinfo, label_type=None, ensemble_method='top_N', ensemble_size=100, output_csv=None)[source]¶
- WORC.plotting.plot_ranked_scores.plot_ranked_scores(estimator, pinfo, label_type, scores='percentages', images=[], segmentations=[], ensemble_method='top_N', ensemble_size=100, output_csv=None, output_zip=None, output_itk=None)[source]¶
Rank the patients according to their average score. The score can either be the average posterior or the percentage of times the patient was classified correctly in the cross validations. Additionally, the middle slice of each patient is plot and saved according to the ranking.
Parameters¶
- estimator: filepath, mandatory
Path pointing to the .hdf5 file which was is the output of the trainclassifier function.
- pinfo: filepath, mandatory
Path pointint to the .txt file which contains the patient label information.
- label_type: string, default None
The name of the label predicted by the estimator. If None, the first label from the prediction file will be used.
- scores: string, default percentages
Type of scoring to be used. Either ‘posteriors’ or ‘percentages’.
- images: list, optional
List containing the filepaths to the ITKImage image files of the patients.
- segmentations: list, optional
List containing the filepaths to the ITKImage segmentation files of the patients.
- ensemble_method: string, optional
Method to be used for ensembling.
- ensemble_size: int, optional
If top_N method is used, number of workflows to be included in ensemble.
- output_csv: filepath, optional
If given, the scores will be written to this csv file.
- output_zip: filepath, optional
If given, the images will be plotted and the pngs saved to this zip file.
- output_itk: filepath, optional
WIP