Enhancing neural mean teacher learning-based emotion-centric model for image captioning
dc.contributor.advisor | Dr. Kalpdrum Passi | |
dc.contributor.author | Piramoon, Majid | |
dc.date.accessioned | 2024-11-27T19:48:53Z | |
dc.date.available | 2024-11-27T19:48:53Z | |
dc.date.issued | 2023-11-09 | |
dc.description.abstract | Image captioning is a task in computer vision and natural language processing that involves generating a textual description of the content of an image. The goal of image captioning is to create a system that can accurately recognize the objects, attributes, and relationships depicted in an image, and generate a meaningful description of it in natural language, typically in the form of a sentence or short paragraph. One of the state-of-the-art methods that we can use for image captioning is Nemesis: Neural Mean Teacher Learning-based Emotion-centric Speaker. Nemesis is a neural mean teacher learning-based emotion-centric speaker. It is a proposed neural speaker capable of leveraging emotional supervision signals in the caption generation process. Nemesis has been applied to the recently introduced ArtEmis dataset, which is the first large-scale dataset for emotion-centric image captioning, containing 455K emotional descriptions of 80K artworks from WikiArt. In this study, I employed a straightforward but improved version of Self-Critical Sequence Training. By modifying the baseline function choice in the REINFORCE algorithm, I introduced a simple alteration. The updated baseline offers enhanced performance without any additional expenses, when compared to the baseline that utilizes greedy decoding. | |
dc.identifier.uri | https://laurentian.scholaris.ca/handle/10219/4226 | |
dc.language.iso | en_CA | |
dc.publisher | Laurentian University Library & Archives | |
dc.rights.holder | Majid Piramoon | |
dc.rights.license | Laurentian University ETD license | |
dc.subject | Image captioning, Natural language processing, ArtEmis dataset, REINFORCE algorithm | |
dc.title | Enhancing neural mean teacher learning-based emotion-centric model for image captioning | |
dc.type | Thesis | |
thesis.degree.discipline | Computational Sciences | |
thesis.degree.grantor | Laurentian University (en_CA) & Université Laurentienne (fr_CA) | |
thesis.degree.level | 1 | |
thesis.degree.name | Master of Science (MSc) in Computational Sciences |