Dynamic gesture classification of American Sign Language using deep learning
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
American Sign Language (ASL) is a visual method of communication, utilized primarily by the hearing-impaired people. ASL is a sign language with 5 fundamental criterions: state of the hand, location (place of articulation), movement, palm orientation, and facial expressions. Since it is the most well-known gesture-based communication (sign language) of the world, it is essential to address dynamic sign gesture recognition for American Sign Language. To address the static sign language recognition in American Sign language a lot of studies have been done and researchers have claimed approximately 99% accuracy in static sign language recognition. There are very few studies currently available for dynamic gesture recognition in ASL. In this study, a subset of American Sign Language dataset was used, namely World-Level American Sign Language (WLASL) which has originally more than 2000 classes for gesturebased classification of American Sign Language from which we have chosen 100 classes. A combination of VGG16-LSTM, VGG19-LSTM, ResNet101-LSTM, Inception-LSTM and Inception3D based Convolutional Neural Networks (CNN) models were used for extracting spatial and temporal features respectively and applied them on the processed and extracted classes of videos from WLASL dataset. We found our model Inception3D outperformed the Visual Geometry Group-Long Short-Term Memory (VGG-LSTM) architectures, and ResNet101-LSTM models. These models have been compared based on model evaluation metric accuracy, thereby providing suitable insights on model selections.