Brill-NL : Brill's part-of-speech tagger for Dutch
Details
The rule-based pos-tagger developed by
Eric Brill 
was slightly changed. It has been trained on a subset of the Eindhoven corpus
[1] using the WOTAN tagset
[2].
The tagger is based on
transformation-based error-driven learning
[3], a technique that has been effective in a number of natural language
applications, including part of speech and word sense tagging, prepositional
phrase attachment, and syntactic parsing.
References
| [1] |
Uit den Boogaart (1975) Woordfrequenties in geschreven en
gesproken Nederlands. Oosthoek, Scheltema & Holkema, Utrecht. |
| [2] |
Berghmans, J. (1994) WOTAN, een automatische grammatikale
tagger voor het Nederlands. Dept. of Language and Speech, University of
Nijmegen. |
| [3] |
Brill, E. (1995) Transformation-based error-driven learning
and natural language processing: A case study in part of speech tagging.
Computational Linguistics, 21(4):543--565. |