Emotion-centric image captioning using a self-critical mean teacher learning approach

dc.contributor.authorYousefi, Aryan
dc.date.accessioned2023-11-17T20:19:55Z
dc.date.available2023-11-17T20:19:55Z
dc.date.issued2022-11-07
dc.description.abstractImage Captioning is the multi-modal task of automatically generating natural language descriptions based on a visual input using various Deep Learning techniques. This research area is in the intersection of Computer Vision and Natural Language Processing fields, and it has gained an increasing popularity over the past few years. Image Captioning is an important part of scene understanding with various extensive applications, such as helping visually impaired people, recommendations in editing applications, and usage in virtual assistants. However, most of the previous work in this topic has been focused on purely objective content-based descriptions of the image scenes. The goal of this thesis is to generate more engaging captions by leveraging humanlike emotional responses in the captioning process. To achieve this task, a Mean Teacher Learningbased method has been applied on the recently introduced ArtEmis dataset. This method includes a self distillation relationship between the memory-augmented language models with meshed connectivity, which will be first trained in a cross-entropy based phase, and then fine-tuned in a Self-Critical Sequence Training phase. In addition, we propose a novel classification module by decreasing texture bias and encouraging the model towards a shape-based classification. We also propose a method to utilize extra emotional supervision signals in the caption generation process, leveraging the image-to-emotion classifier. Comparing with the state-of-the-art results on ArtEmis dataset, our proposed model outperforms the current benchmark significantly in multiple popular evaluation metrics, such as BLEU, METEOR, ROUGE-L, and CIDEren_US
dc.description.degreeMaster of Science (MSc) in Computational Sciencesen_US
dc.identifier.urihttps://laurentian.scholaris.ca/handle/10219/4100
dc.language.isoenen_US
dc.publisher.grantorLaurentian University of Sudburyen_US
dc.subjectImage captioningen_US
dc.subjectcomputer visionen_US
dc.subjectnatural language processingen_US
dc.subjectmean teacher learningen_US
dc.subjectself-critical sequence trainingen_US
dc.titleEmotion-centric image captioning using a self-critical mean teacher learning approachen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Thesis FINAL-Aryan Yousefi_14_Nov-2022.pdf
Size:
1.89 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.52 KB
Format:
Item-specific license agreed upon to submission
Description: