AlpinoGraph
AlpinoGraph is a tool to query syntactically annotated corpora. The tool makes use of AgensGraph, which combines database technology (PostgreSQL) and Cypher, the standard query language for graphs. The queries that one can use in AlpinoGraph are thus a mix of SQL and Cypher. AlpinoGraph additionally provides some extra extensions, such as a simple system of macros, and a visualization of results.
Overview
- AlpinoGraph is a tool to query syntactically annotated corpora as graphs instead of trees. This allows for some extra flexibilities.
- The query language used by AlpinoGraph is a mix of SQL and Cypher.
- AlpinoGraph makes use of Alpino's annotations and Universal Dependencies.
Data
AlpinoGraph ships with (treebanks of) the LASSY-SMALL, the CGN (Corpus Gesproken Nederlands) and the Eindhoven corpus, among others. See this page for a full list (in Dutch).
Learn
Instruction
The key publication of AlpinoGraph has a detailed documentation of AlpinoGraph's workings, and how queries are formulated. It is recommended to read this.
Additionally, the AlpinoGraph page has some useful examples and explanations that can be found in the menu in the top-left of the page. For a more extensive documentation of AlpinoGraph, users are invited to visit the Help page of AlpinoGraph. These pages are, however, only available in Dutch.
Alpino Annotations
AlpinoGraph makes use of the annotations and tagging provided by Alpino. For a detailed description of the syntactic annotations used by Alpino, one should check the document: Lassy Syntactische Annotatie. For the annotation of parts-of-speech and lemmas, one should check the document: Part of speech tagging en lemmatisering van het D-coi corpus. These documents are, however, only available in Dutch.
The following document (in English) may also be useful: Manual for syntactic annotators.
Universal Dependencies
AlpinoGraph also makes use of annotations and tagging following Universal Dependencies (UD), a programme that aims at cross-linguistically consistent tagging and dependency parsing. UD is an open community effort with over 500 contributors producing over 200 treebanks in over 100 languages. If you’re new to UD, you should start by reading the first part of the Short Introduction and then browsing the annotation guidelines on the UD website.
User support
AlpinoGraph was developed at the Center for Language and Cognition of the University of Groningen by Peter Kleiweg. Any issues can be reported on AlpinoGraph's GitHub.
Mentions
Key publications
- Peter Kleiweg and Gertjan van Noord. 2020. AlpinoGraph: A Graph-based Search Engine for Flexible and Efficient Treebank Search. In Proceedings of the 19th International Workshop on Treebanks and Linguistic Theories, pages 151–161, Düsseldorf, Germany. Association for Computational Linguistics.
Webpages
TODO: alud