Abstract
The most common cancer globally is skin cancer. Early detection of skin cancer can drastically increase patient survival rates; therefore, a computerized image classification system of skin lesions can save time, and by extension, human life. In this paper, we elevate a traditional CNN model which inputs only images into a state-of-the-art multimodal model which concatenates the CNN image model with metadata features. Our results show that our multimodal model outperforms the unimodal model by a 12.15% increase in accuracy on average. We further improve our model by exploring various CNN architectures, specifically ResNet-18 and VGG16. Our accuracies increased by 9.81% on average when using ResNet-18, and we confirm these results by applying the Grad-CAM algorithm on our skin lesion images.