Alpino

Alpino is a dependency parser for Dutch that also analyses sentences in terms of constituents. Apart from a purely syntactic analysis, Alpino also provides part-of-speech tagging, lemmatization and morphological tagging. The output is formatted in XML.

Example of an Alpino parse

Overview

  • Alpino parses, POS-tags, lemmatizes and analyses morphologically any Dutch sentence that it is given. Alpino reliably analyses an input sentence syntactically, yielding a fully annotated syntactic tree with both constituents as well as explicitly labelled syntactic relations.

  • Alpino is a rule-based parser with a statistics-based disambiguation component.

  • Alpino's grammar has been augmented to build structures based on the guidelines of CGN (Corpus of Spoken Dutch) and D-COI.

  • Alpino's output is formatted in XML, allowing it to be queried by formal query languages such as XPath. Tools such as PaQu and GrETEL leverage this feature and use Alpino in the background for querying purposes.

  • Developed by the University of Groningen, Alpino is available as a webservice hosted by the Radboud University Nijmegen, but can also be installed locally.

Learn

Quick Use

Using Alpino is easiest with the webservice, but requires one the log in using an institutional account. Once logged in, the user can create a new project, upload a tokenized or un-tokenized file (or input the text directly as plain text), and have have Alpino parse all input sentences. The output consists of one FoLIA XML file for the entire input, as well as one file per input sentence in standard Alpino annotation.

For quick single-sentence parses, one can use the online demo or GrETEL. Neither options require a log-in, and both showcase the parse yielded as a tree for quick inspection.

Local Installation

Alpino can also be installed locally. For this, we refer to the tool's general User Guide and GitHub page.

Some more comments on using Alpino on Windows are necessary, however. Please read Daniël de Kok's blog post on this, if one desires to use Alpino on Windows.

Annotation Guidelines

In the end, the hardest part about using Alpino is understanding its annotations. For a detailed description of the syntactic annotations used by Alpino, one should check the document: Lassy Syntactische Annotatie. For the annotation of parts-of-speech and lemmas, one should check the document: Part of speech tagging en lemmatisering van het D-coi corpus. These documents are, however, only available in Dutch.

The following document (in English) may also be useful: Manual for syntactic annotators.

Mentions

  • The Lassy corpus was parsed with Alpino. The Lassy Klein subcorpus was manually corrected.
  • Van Noord, Gertjan, Bouma, Gosse, Van Eynde, Frank, De Kok, Daniël, Van der Linde, Jelmer, Schuurman, Ineke, Tjong Kim Sang, Erik, & Vandeghinste, Vincent (2013). Large scale syntactic annotation of written Dutch: Lassy. In Peter Spyns, & Jan Odijk (Eds.), Essential speech and language technology for Dutch: Results by the STEVIN programme (pp. 147-164). Springer Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30910-6

  • Press release on Alpino (in Dutch)

Publications

  • Van Noord, Gertjan. (2006, april 10–13). At Last Parsing Is Now Operational. In Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Conférences invitées (pp. 20–42). ATALA, Leuven. https://aclanthology.org/2006.jeptalnrecital-invite.2/

Webpages

  • Alpino home page
  • GitHub page
  • Alpino web demo
  • Alpino User Guide
  • Alpino on Windows

  • PaQu - Parse and Query makes it possible to search in syntactically annotated corpora in Dutch. PaQu uses the Alpino parser to make treebanks of your own text corpus, and to search in these treebanks.

  • GrETEL is a tool to query-by-example corpora and treebanks that were parsed by Alpino.

  • AlpinoGraph is a tool query syntactically annotated corpora as graphs instead of treebanks, allowing for some other flexibilities.

  • SASTA, a tool for the semi-automatic analysis of spontaneous-language fragments of children with an SLI, uses Alpino to analyse the utterances grammatically.

  • Redekundig.nl is a tool for Dutch high-schoolers, that uses Alpino as backend to classify parts of speech and grammatical functions of phrases in sentences (so-called "taalkundig ontleden" and "redekundig ontleden").

Credits and Contact Information

Alpino was developed in the context of the PIONIER Project Algorithms for Linguistic Processing.

Alpino was released under the Gnu Lesser General Public License.