Abstract
When people produce or read texts, they produce loads of by-product in
form of behavioral data. Examples include click-through data, but also
more distant sources such as cognitive processing data like eye
tracking or keystroke dynamics. Such fortuitous data [5] represents a
potentially immense resource of side benefit in the form of noisy
auxiliary data. However, can we use such auxiliary data to improve
natural language processing? Only very little work exists, some first
promising attempts mainly focused on gaze pattern data e.g.,
[1,2]. There is no prior work yet that explores keystroke dynamics.
Keystroke dynamics concerns a user's typing pattern. When a person
types, the latencies between successive keystrokes and their duration
reflect the unique typing behavior of a person. Keystroke dynamics
have been extensively used in psycholinguistic and writing research to
gain insights into cognitive processing. Keystroke logs have the
distinct advantage over other cognitive modalities like eye tracking
or brain scanning, that they are readily available and can be
harvested easily, because they do not rely on any special equipment
beyond a keyboard. Moreover, they are non-intrusive, inexpensive, and
have the potential to offer continuous adaptation to specific
users. Imagine integrating keystroke logging into (online) text
processing tools. But do keystroke logs contain actual signal that
informs natural language processing (NLP) models?
We postulate that keystroke dynamics contain information about
syntactic structure that can inform shallow syntactic parsing. To test
this hypothesis, we perform first experiments in which we use
keystroke dynamics as auxiliary data in a multi-task learning setup
[3,4]. In particular, we first need to refine the raw keystroke data,
device a simple approach to derive automatically-labeled data from raw
keystroke logs (in particular, pre-word pauses), and integrate them as
auxiliary task in a multi-task bidirectional LSTM model. We show the
effectiveness of using auxiliary keystroke data on two shallow
syntactic parsing tasks, chunking and CCG supertagging. Our model is
simple, has the advantage that data can come from distinct sources,
and produces models that are significantly better than models trained
on the text annotations alone.
Note: this work will be presented at COLING 2016, and the full text of
this submission is available at [4].
References:
[1] Barrett, Maria; Søgaard, Anders. 2015. Using reading behavior to
predict grammatical functions. EMNLP Workshop on Cognitive Aspects of
Computational Language Learning. Lisbon, Portugal.
[2] Klerke, Sigrid; Goldberg, Yoav; Søgaard, Anders. 2016. Improving
sentence compression by learning to predict gaze. North American
Chapter of the Association for Computational Linguistics (NAACL). San
Diego, CA.
[3] Barbara Plank, Anders Søgaard and Yoav Goldberg. Multilingual
Part-of-Speech Tagging with Bidirectional Long Short-Term Memory
Models and Auxiliary Loss. In ACL, 2016. Berlin, Germany.
[4] Barbara Plank. Keystroke dynamics as signal for shallow syntactic
parsing. The 26th International Conference on Computational
Linguistics (COLING). Osaka, Japan.
[5] Barbara Plank. What to do about non-standard (or non-canonical)
language in NLP. In KONVENS 2016. Bochum, Germany.
form of behavioral data. Examples include click-through data, but also
more distant sources such as cognitive processing data like eye
tracking or keystroke dynamics. Such fortuitous data [5] represents a
potentially immense resource of side benefit in the form of noisy
auxiliary data. However, can we use such auxiliary data to improve
natural language processing? Only very little work exists, some first
promising attempts mainly focused on gaze pattern data e.g.,
[1,2]. There is no prior work yet that explores keystroke dynamics.
Keystroke dynamics concerns a user's typing pattern. When a person
types, the latencies between successive keystrokes and their duration
reflect the unique typing behavior of a person. Keystroke dynamics
have been extensively used in psycholinguistic and writing research to
gain insights into cognitive processing. Keystroke logs have the
distinct advantage over other cognitive modalities like eye tracking
or brain scanning, that they are readily available and can be
harvested easily, because they do not rely on any special equipment
beyond a keyboard. Moreover, they are non-intrusive, inexpensive, and
have the potential to offer continuous adaptation to specific
users. Imagine integrating keystroke logging into (online) text
processing tools. But do keystroke logs contain actual signal that
informs natural language processing (NLP) models?
We postulate that keystroke dynamics contain information about
syntactic structure that can inform shallow syntactic parsing. To test
this hypothesis, we perform first experiments in which we use
keystroke dynamics as auxiliary data in a multi-task learning setup
[3,4]. In particular, we first need to refine the raw keystroke data,
device a simple approach to derive automatically-labeled data from raw
keystroke logs (in particular, pre-word pauses), and integrate them as
auxiliary task in a multi-task bidirectional LSTM model. We show the
effectiveness of using auxiliary keystroke data on two shallow
syntactic parsing tasks, chunking and CCG supertagging. Our model is
simple, has the advantage that data can come from distinct sources,
and produces models that are significantly better than models trained
on the text annotations alone.
Note: this work will be presented at COLING 2016, and the full text of
this submission is available at [4].
References:
[1] Barrett, Maria; Søgaard, Anders. 2015. Using reading behavior to
predict grammatical functions. EMNLP Workshop on Cognitive Aspects of
Computational Language Learning. Lisbon, Portugal.
[2] Klerke, Sigrid; Goldberg, Yoav; Søgaard, Anders. 2016. Improving
sentence compression by learning to predict gaze. North American
Chapter of the Association for Computational Linguistics (NAACL). San
Diego, CA.
[3] Barbara Plank, Anders Søgaard and Yoav Goldberg. Multilingual
Part-of-Speech Tagging with Bidirectional Long Short-Term Memory
Models and Auxiliary Loss. In ACL, 2016. Berlin, Germany.
[4] Barbara Plank. Keystroke dynamics as signal for shallow syntactic
parsing. The 26th International Conference on Computational
Linguistics (COLING). Osaka, Japan.
[5] Barbara Plank. What to do about non-standard (or non-canonical)
language in NLP. In KONVENS 2016. Bochum, Germany.
Original language | English |
---|---|
Publication status | Published - 2016 |
Event | 11th Women in Machine Learning Workshop (WiML 2016) - Barcelona , Spain Duration: 5-Dec-2016 → 5-Dec-2016 Conference number: 11 |
Workshop
Workshop | 11th Women in Machine Learning Workshop (WiML 2016) |
---|---|
Abbreviated title | WiML 2016 |
Country/Territory | Spain |
City | Barcelona |
Period | 05/12/2016 → 05/12/2016 |