Performance Analysis of Vision Transformer (ViT), ResNet50, and MobileNetV3 Large in Multiclass Bone Fracture Classification

Ei Phyu Sin Win; Phyo Thu Zar Tun

doi:10.33022/ijcs.v15i3.5150

Authors

Ei Phyu Sin Win Mandalay Technological University, Myanmar
Phyo Thu Zar Tun Mandalay Technological University, Myanmar

DOI:

https://doi.org/10.33022/ijcs.v15i3.5150

Keywords:

Bone Fracture Classification, Real-time Medical Imaging , Vision Transformer (ViT), MobileNetV3, Real-time Medical Imaging

Abstract

Automated classification of bone fractures has become a cornerstone of modern emergency radiology, significantly enhancing diagnostic speed and precision. This study evaluates the comparative efficacy of three leading deep learning frameworks ResNet50, MobileNetV3, and Vision Transformer (ViT) using a diverse dataset that includes various fracture modalities, healthy X-rays, and non-radiological images.The experimental data reveals that the Vision Transformer (ViT) attained the highest diagnostic accuracy at 95%, marginally outperforming MobileNetV3 and ResNet50, which both achieved 94%. While all three models demonstrated flawless reliability (100%) in identifying Forteen Classes Bone categories, their performance diverged when analyzing complex fracture patterns.