Performance Analysis of Vision Transformer (ViT), ResNet50, and MobileNetV3 Large in Multiclass Bone Fracture Classification
Keywords:
Bone Fracture Classification, Real-time Medical Imaging , Vision Transformer (ViT), MobileNetV3, Real-time Medical ImagingAbstract
Automated classification of bone fractures has become a cornerstone of modern emergency radiology, significantly enhancing diagnostic speed and precision. This study evaluates the comparative efficacy of three leading deep learning frameworks ResNet50, MobileNetV3, and Vision Transformer (ViT) using a diverse dataset that includes various fracture modalities, healthy X-rays, and non-radiological images.The experimental data reveals that the Vision Transformer (ViT) attained the highest diagnostic accuracy at 95%, marginally outperforming MobileNetV3 and ResNet50, which both achieved 94%. While all three models demonstrated flawless reliability (100%) in identifying Forteen Classes Bone categories, their performance diverged when analyzing complex fracture patterns.
Published
Issue
Section
License
Copyright (c) 2026 eiphyu sinwin, phyothuzartun

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.





