Performance Analysis of Vision Transformer (ViT), ResNet50, and MobileNetV3 Large in Multiclass Bone Fracture Classification

Authors

  • Ei Phyu Sin Win Mandalay Technological University, Myanmar
  • Phyo Thu Zar Tun Mandalay Technological University, Myanmar

Keywords:

Bone Fracture Classification, Real-time Medical Imaging , Vision Transformer (ViT), MobileNetV3, Real-time Medical Imaging

Abstract

Automated classification of bone fractures has become a cornerstone of modern emergency radiology, significantly enhancing diagnostic speed and precision. This study evaluates the comparative efficacy of three leading deep learning frameworks ResNet50, MobileNetV3, and Vision Transformer (ViT) using a diverse dataset that includes various fracture modalities, healthy X-rays, and non-radiological images.The experimental data reveals that the Vision Transformer (ViT) attained the highest diagnostic accuracy at 95%, marginally outperforming MobileNetV3 and ResNet50, which both achieved 94%. While all three models demonstrated flawless reliability (100%) in identifying Forteen Classes Bone categories, their performance diverged when analyzing complex fracture patterns.

Published

25-05-2026