Emotion-centric image captioning using a self-critical mean teacher learning approach

Yousefi, Aryan

Emotion-centric image captioning using a self-critical mean teacher learning approach

dc.contributor.author	Yousefi, Aryan
dc.date.accessioned	2023-11-17T20:19:55Z
dc.date.available	2023-11-17T20:19:55Z
dc.date.issued	2022-11-07
dc.description.abstract	Image Captioning is the multi-modal task of automatically generating natural language descriptions based on a visual input using various Deep Learning techniques. This research area is in the intersection of Computer Vision and Natural Language Processing fields, and it has gained an increasing popularity over the past few years. Image Captioning is an important part of scene understanding with various extensive applications, such as helping visually impaired people, recommendations in editing applications, and usage in virtual assistants. However, most of the previous work in this topic has been focused on purely objective content-based descriptions of the image scenes. The goal of this thesis is to generate more engaging captions by leveraging humanlike emotional responses in the captioning process. To achieve this task, a Mean Teacher Learningbased method has been applied on the recently introduced ArtEmis dataset. This method includes a self distillation relationship between the memory-augmented language models with meshed connectivity, which will be first trained in a cross-entropy based phase, and then fine-tuned in a Self-Critical Sequence Training phase. In addition, we propose a novel classification module by decreasing texture bias and encouraging the model towards a shape-based classification. We also propose a method to utilize extra emotional supervision signals in the caption generation process, leveraging the image-to-emotion classifier. Comparing with the state-of-the-art results on ArtEmis dataset, our proposed model outperforms the current benchmark significantly in multiple popular evaluation metrics, such as BLEU, METEOR, ROUGE-L, and CIDEr	en_US
dc.description.degree	Master of Science (MSc) in Computational Sciences	en_US
dc.identifier.uri	https://laurentian.scholaris.ca/handle/10219/4100
dc.language.iso	en	en_US
dc.publisher.grantor	Laurentian University of Sudbury	en_US
dc.subject	Image captioning	en_US
dc.subject	computer vision	en_US
dc.subject	natural language processing	en_US
dc.subject	mean teacher learning	en_US
dc.subject	self-critical sequence training	en_US
dc.title	Emotion-centric image captioning using a self-critical mean teacher learning approach	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Thesis FINAL-Aryan Yousefi_14_Nov-2022.pdf
Size:: 1.89 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.52 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Computational Sciences - Master's theses