The side benefit of behavior: using keystroke dynamics to inform Natural Language Processing

Barbara Plank

    Research output: Contribution to conferenceAbstractAcademic

    Abstract

    When people produce or read texts, they produce loads of by-product in
    form of behavioral data. Examples include click-through data, but also
    more distant sources such as cognitive processing data like eye
    tracking or keystroke dynamics. Such fortuitous data [5] represents a
    potentially immense resource of side benefit in the form of noisy
    auxiliary data. However, can we use such auxiliary data to improve
    natural language processing? Only very little work exists, some first
    promising attempts mainly focused on gaze pattern data e.g.,
    [1,2]. There is no prior work yet that explores keystroke dynamics.

    Keystroke dynamics concerns a user's typing pattern. When a person
    types, the latencies between successive keystrokes and their duration
    reflect the unique typing behavior of a person. Keystroke dynamics
    have been extensively used in psycholinguistic and writing research to
    gain insights into cognitive processing. Keystroke logs have the
    distinct advantage over other cognitive modalities like eye tracking
    or brain scanning, that they are readily available and can be
    harvested easily, because they do not rely on any special equipment
    beyond a keyboard. Moreover, they are non-intrusive, inexpensive, and
    have the potential to offer continuous adaptation to specific
    users. Imagine integrating keystroke logging into (online) text
    processing tools. But do keystroke logs contain actual signal that
    informs natural language processing (NLP) models?

    We postulate that keystroke dynamics contain information about
    syntactic structure that can inform shallow syntactic parsing. To test
    this hypothesis, we perform first experiments in which we use
    keystroke dynamics as auxiliary data in a multi-task learning setup
    [3,4]. In particular, we first need to refine the raw keystroke data,
    device a simple approach to derive automatically-labeled data from raw
    keystroke logs (in particular, pre-word pauses), and integrate them as
    auxiliary task in a multi-task bidirectional LSTM model. We show the
    effectiveness of using auxiliary keystroke data on two shallow
    syntactic parsing tasks, chunking and CCG supertagging. Our model is
    simple, has the advantage that data can come from distinct sources,
    and produces models that are significantly better than models trained
    on the text annotations alone.

    Note: this work will be presented at COLING 2016, and the full text of
    this submission is available at [4].

    References:

    [1] Barrett, Maria; Søgaard, Anders. 2015. Using reading behavior to
    predict grammatical functions. EMNLP Workshop on Cognitive Aspects of
    Computational Language Learning. Lisbon, Portugal.

    [2] Klerke, Sigrid; Goldberg, Yoav; Søgaard, Anders. 2016. Improving
    sentence compression by learning to predict gaze. North American
    Chapter of the Association for Computational Linguistics (NAACL). San
    Diego, CA.

    [3] Barbara Plank, Anders Søgaard and Yoav Goldberg. Multilingual
    Part-of-Speech Tagging with Bidirectional Long Short-Term Memory
    Models and Auxiliary Loss. In ACL, 2016. Berlin, Germany.

    [4] Barbara Plank. Keystroke dynamics as signal for shallow syntactic
    parsing. The 26th International Conference on Computational
    Linguistics (COLING). Osaka, Japan.

    [5] Barbara Plank. What to do about non-standard (or non-canonical)
    language in NLP. In KONVENS 2016. Bochum, Germany.
    Original languageEnglish
    Publication statusPublished - 2016
    Event11th Women in Machine Learning Workshop (WiML 2016) - Barcelona , Spain
    Duration: 5-Dec-20165-Dec-2016
    Conference number: 11

    Workshop

    Workshop11th Women in Machine Learning Workshop (WiML 2016)
    Abbreviated titleWiML 2016
    Country/TerritorySpain
    CityBarcelona
    Period05/12/201605/12/2016

    Fingerprint

    Dive into the research topics of 'The side benefit of behavior: using keystroke dynamics to inform Natural Language Processing'. Together they form a unique fingerprint.

    Cite this