Hatching Chick at SemEval-2018 Task 2: Multilingual Emoji Prediction

J. Coster, R. G. van Dalen, N. A. J. Stierman

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    4 Citations (Scopus)
    264 Downloads (Pure)

    Abstract

    As part of a SemEval 2018 shared task an attempt was made to build a system capable of predicting the occurence of a language's most frequently used emoji in Tweets. Specifically, models for English and Spanish data were created and trained on 500.000 and 100.000 tweets respectively. In order to create these models, first a logistic regressor, a sequential LSTM, a random forest regressor and a SVM were tested. The latter was found to perform best and therefore optimized individually for both languages. During developmet f1-scores of 61 and 82 were obtained for English and Spanish data respectively, in comparison, f1-scores on the official evaluation data were 21 and 18. The significant decrease in performance during evaluation might be explained by overfitting during development and might therefore have partially be prevented by using cross-validation. Over all, emoji which occur in a very specific context such as a Christmas tree were found to be most predictable.
    Original languageEnglish
    Title of host publicationThe International Workshop on Semantic Evaluation
    Subtitle of host publicationProceedings of the Twelfth Workshop
    Place of PublicationNew Orleans, Louisiana
    PublisherAssociation for Computational Linguistics (ACL)
    Pages445-448
    Number of pages4
    ISBN (Print)978-1-948087-20-9
    DOIs
    Publication statusPublished - 1-Jun-2018
    EventThe International Workshop on Semantic Evaluation: Proceedings of the Twelfth Workshop - New Orleans, United States
    Duration: 5-Jun-20186-Jun-2018

    Conference

    ConferenceThe International Workshop on Semantic Evaluation
    Country/TerritoryUnited States
    CityNew Orleans
    Period05/06/201806/06/2018

    Fingerprint

    Dive into the research topics of 'Hatching Chick at SemEval-2018 Task 2: Multilingual Emoji Prediction'. Together they form a unique fingerprint.

    Cite this