Classifying Breast Tumors as Malignant or Benign Using Digitized Images of Fine Needle Aspiration Samples of Breast Mass Tissue: An Application of Classification Tree Algorithms

Document Type : Original Research

Authors
Department of Biostatistics, Faculty of Medical Sciences, Tarbiat Modares University,Tehran, Iran
Abstract
Introduction: Breast cancer represents a major public health issue worldwide, highlighting the critical role of early detection in facilitating effective treatment. Fine needle aspiration (FNA) serves as a minimally invasive method for obtaining cellular material from breast masses for subsequent analysis. Nonetheless, pathologists' assessment of FNA samples may be characterized by subjectivity and protracted evaluation times, leading to variability in diagnostic results. Integrating machine learning algorithms, including classification tree models, can potentially improve the consistency and precision of breast tumor classification. Using computational capabilities and sophisticated machine learning methodologies, these models can proficiently categorize digitized images of FNA samples as malignant or benign.

Methods: We used classification tree algorithms such as CART, Ctree, Evtree, QUEST, CRUISE, and GUIDE to distinguish between malignant and benign tumors in the Wisconsin Breast Cancer Dataset (WBCD). The models' performance was evaluated using accuracy metrics, such as sensitivity, specificity, false positive and negative rates, positive and negative predictive values, Youden's Index, accuracy, positive and negative likelihood ratios, diagnostic odds ratios, and AUC (area under the ROC curve).

Results: The results showed that the CRUISE algorithm showed excellent diagnostic performance in distinguishing between malignant and benign tumors.

Conclusion: The results emphasize the critical role of integrating machine learning models into clinical practice to assist pathologists, improve diagnostic outcomes, and reduce subjectivity in cancer classification.

Keywords

Subjects


1. Siegel RL, Miller KD, Jemal AJCacjfc. Cancer statistics, 2018. 2018;68(1):7-30.
2. Wang J, Wu S-GJBCT, Therapy. Breast cancer: an overview of current therapeutic strategies, challenge, and perspectives. 2023:721-30.
3. Bukhari MH, Arshad M, Jamal S, Niazi S, Bashir S, Bakhshi IM, et al. Use of Fine‐Needle Aspiration in the Evaluation of Breast Lumps. 2011;2011(1):689521.
4. Jemal A, Bray F, Center MM, Ferlay J, Ward E, Forman DJCacjfc. Global cancer statistics. 2011;61(2):69-90.
5. Modjtabai A, Khatib OM. Guidelines for the early detection and screening of breast cancer: World Health Organization, Regional Office for the Eastern Mediterranean; 2006.
6. Mort E, Esserman L, Tripathy D, Hillner B, Houghton J, Bunker J, et al. Diagnosis and management of early-stage breast cancer. 1995;2:25-42.
7. Gomes Pinto D, Schmitt FCJAC. Overcoming Pitfalls in Breast Fine-Needle Aspiration Cytology: A Practical Review. 2024;68(3):206-18.
8. Yue W, Wang Z, Chen H, Payne A, Liu XJD. Machine learning with applications in breast cancer diagnosis and prognosis. 2018;2(2):13.
9. De'ath G, Fabricius KE. Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology. 2000;81(11):3178-92.
10. Lemon SC, Roy J, Clark MA, Friedmann PD, Rakowski W. Classification and regression tree analysis in public health: methodological review and comparison with logistic regression. Annals of behavioral medicine. 2003;26(3):172-81.
11. Feldesman MR. Classification trees as an alternative to linear discriminant analysis. American Journal of Physical Anthropology: The Official Publication of the American Association of Physical Anthropologists. 2002;119(3):257-75.
12. Malehi AS, Jahangiri M. Classic and Bayesian Tree-Based Methods. Enhanced Expert Systems. 2019:27.
13. Jahangiri M, Khodadi E, Rahim F, Saki N, Saki Malehi A. Decision‐tree‐based methods for differential diagnosis of β‐thalassemia trait from iron deficiency anemia. Expert Systems. 2017;34(3):e12201.
14. Rahim F, Kazemnejad A, Jahangiri M, Malehi AS, Gohari K. Diagnostic performance of classification trees and hematological functions in hematologic disorders: an application of multidimensional scaling and cluster analysis. BMC medical informatics and decision making. 2021;21(1):1-13.
15. Jahangiri M, Rahim F, Saki N, Saki Malehi A. Application of Bayesian Decision Tree in Hematology Research: Differential Diagnosis of β-Thalassemia Trait from Iron Deficiency Anemia. Computational and Mathematical Methods in Medicine. 2021;2021.
16. Rasool A, Bunterngchit C, Tiejian L, Islam MR, Qu Q, Jiang QJIjoer, et al. Improved machine learning-based predictive models for breast cancer diagnosis. 2022;19(6):3211.
17. Sachdeva RK, Bathla PJIJoSI. A machine learning-based framework for diagnosis of breast cancer. 2022;10(1):1-11.
18. Street WN, Wolberg WH, Mangasarian OL, editors. Nuclear feature extraction for breast tumor diagnosis. Biomedical image processing and biomedical visualization; 1993: SPIE.
19. Wolberg WH, Street WN, Mangasarian OLJCl. Machine learning techniques to diagnose breast cancer from image-processed nuclear features of fine needle aspirates. 1994;77(2-3):163-71.
20. Mangasarian OL, Street WN, Wolberg WHJOr. Breast cancer diagnosis and prognosis via linear programming. 1995;43(4):570-7.
21. Loh WY. Classification and regression tree methods. Encyclopedia of statistics in quality and reliability. 2008.
22. Loh W-Y, Shih Y-S. Split selection methods for classification trees. Statistica sinica. 1997:815-40.
23. Loh W-Y. Improving the precision of classification trees. The Annals of Applied Statistics. 2009:1710-37.
24. Kim H, Loh W-Y. Classification trees with bivariate linear discriminant node models. Journal of Computational and Graphical Statistics. 2003;12(3):512-30.
25. Kim H, Loh W-Y. Classification trees with unbiased multiway splits. Journal of the American Statistical Association. 2001;96(454):589-604.
26. Hothorn T, Hornik K, Zeileis A. Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics. 2006;15(3):651-74.
27. Grubinger T, Zeileis A, Pfeiffer K-P. evtree: Evolutionary learning of globally optimal classification and regression trees in R. Working Papers in Economics and Statistics; 2011.
28. Šimundić A-MJM, sciences b. Measures of diagnostic accuracy: basic definitions. 2008;22(4):61-5.
29. Wang K, Phillips CA, Saxton AM, Langston MA. EntropyExplorer: an R package for computing and comparing differential Shannon entropy, differential coefficient of variation, and differential expression. BMC research notes. 2015;8(1):1-5.
30. Kuhn MJJoss. Caret package. 2008;28(5):1-26.
31. Stevenson M, Stevenson MM, BiasedUrn IJTftaoedRpv. Package ‘epiR’. 2015.
32. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, et al. Package ‘pROC’. 2021.