Enhancing neural mean teacher learning-based emotion-centric model for image captioning

dc.contributor.advisorDr. Kalpdrum Passi
dc.contributor.authorPiramoon, Majid
dc.date.accessioned2024-11-27T19:48:53Z
dc.date.available2024-11-27T19:48:53Z
dc.date.issued2023-11-09
dc.description.abstractImage captioning is a task in computer vision and natural language processing that involves generating a textual description of the content of an image. The goal of image captioning is to create a system that can accurately recognize the objects, attributes, and relationships depicted in an image, and generate a meaningful description of it in natural language, typically in the form of a sentence or short paragraph. One of the state-of-the-art methods that we can use for image captioning is Nemesis: Neural Mean Teacher Learning-based Emotion-centric Speaker. Nemesis is a neural mean teacher learning-based emotion-centric speaker. It is a proposed neural speaker capable of leveraging emotional supervision signals in the caption generation process. Nemesis has been applied to the recently introduced ArtEmis dataset, which is the first large-scale dataset for emotion-centric image captioning, containing 455K emotional descriptions of 80K artworks from WikiArt. In this study, I employed a straightforward but improved version of Self-Critical Sequence Training. By modifying the baseline function choice in the REINFORCE algorithm, I introduced a simple alteration. The updated baseline offers enhanced performance without any additional expenses, when compared to the baseline that utilizes greedy decoding.
dc.identifier.urihttps://laurentian.scholaris.ca/handle/10219/4226
dc.language.isoen_CA
dc.publisherLaurentian University Library & Archives
dc.rights.holderMajid Piramoon
dc.rights.licenseLaurentian University ETD license
dc.subjectImage captioning, Natural language processing, ArtEmis dataset, REINFORCE algorithm
dc.titleEnhancing neural mean teacher learning-based emotion-centric model for image captioning
dc.typeThesis
thesis.degree.disciplineComputational Sciences
thesis.degree.grantorLaurentian University (en_CA) & Université Laurentienne (fr_CA)
thesis.degree.level1
thesis.degree.nameMaster of Science (MSc) in Computational Sciences

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Majid Piramoon - FINAL THESIS - 14-June-2024.pdf
Size:
2.24 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.92 KB
Format:
Item-specific license agreed upon to submission
Description: