AP1 - Clinically Grounded Evaluation of Automated Segmentation Labels for Improvement of Computer Vision Models

POSTER

Connor R. Davey, Reid D. Jockisch, Sister M. Pieta, Bradley P. Sutton, Samuel Hawkins, Matthew T. Bramlet

Current methods of evaluating machine-learned medical segmentation algorithms are inadequate for clinical workflows and targeted improvement of models. We have developed a methodology for grading these models that is based on clinical relevance and performance, rather than simple comparison to ground truth data. We hypothesize that this clinical relevance focused grading method will allow us to strategically target poorly performing machine-labeled anatomies to improve their accuracy with manual intervention. To determine the effectiveness of the developed grading method, several raters were chosen to independently review twenty cases to establish inter- and intra-rater reliability, with plans to expand the number of cases in the future. Initial analysis of grading data has produced clear foci for improvement of machine models, despite conventional evaluations indicating the same models were very high performing.