Grammar Customization with the LinGO Grammar Matrix

Tutorial at LREC 2010

Valletta, Malta; 17 May 2010

Emily M. Bender, Antske Fokkens, and Safiyyah Saleem

Overview

This tutorial provides an overview of the LinGO Grammar Matrix customization system (Bender et al., 2002; 2010), a free web-based tool that can be used as an easy entry point into developing broad-based grammars for those unfamiliar with grammar engineering and as a time-saving device for those who are.

The Grammar Matrix customization system is a web-based service which elicits typological descriptions of languages and outputs customized grammar fragments suitable for sustained development into broad-coverage grammars. The created grammars use the formalism of Head Driven Phrase Structure Grammar (Pollard and Sag 1994, HPSG), provide bidirectional mappings between surface strings and semantic representations in the format of Minimal Recursion Semantics (Copestake et al. 2005, MRS), and can be run and further developed within the LKB grammar development environment (Copestake 2002).

We intend this tutorial to be of interest to computational linguists of various stripes. Researchers in statistical NLP may find it interesting as a view into a structure-based approach to cross-linguistic variation. Experienced grammar engineers may find this overview interesting for cross-framework comparison and/or the construction of multilingual resources similar to the Grammar Matrix but representing different frameworks. Theoretically-oriented syntacticians can use the Grammar Matrix customization system for linguistic hypothesis testing (Bender, 2008), while typologists may be interested in it as a means of investigating the interaction of phenomena cross-linguistically.

Tutorial resources

Tutorial outline

Software/Links

The LinGO Grammar Matrix customization system is an on-line tool, but the grammars it creates are meant to be used with the LKB grammar development environment (Copestake 2002). If you would like to follow along with that part of the tutorial, we encourage you to install the software ahead of time. Since we also use [incr tsdb()] (Oepen 2001), we recommmend the linux version of the software.

We plan to include time for discussion of how to model languages other than our primary example language. If you have a language you would like us to consider, please prepare a test suite for that language. Testsuite guidelines:

You may also wish to begin filling out the customization system questionnaire ahead of time.

References

Bender, Emily M. 2008. Grammar engineering for linguistic hypothesis testing. In Nicholas Gaylord, Alexis Palmer, and Elias Ponvert, editors, Proceedings of the Texas Linguistics Society X Conference: Computational Linguistics for Less-Studied Languages, pages 16-36, Stanford. CSLI Publications.

Bender, Emily M., Dan Flickinger, and Stephan Oepen. 2002. The Grammar Matrix: An Open-Source Starter-Kit for the Rapid Development of Cross-Linguistically Consistent Broad-Coverage Precision Grammars. In Procedings of the Workshop on Grammar Engineering and Evaluation at the 19th International Conference on Computational Linguistics, Taipei, Taiwan.

Bender, Emily M., Scott Drellishak, Antske Fokkens, Michael Wayne Goodman, Daniel P. Mills, Laurie Poulson, and Safiyyah Saleem. 2010. Grammar prototyping and testing with the lingo grammar matrix customization system. In Proceedings of the ACL 2010 Software Demonstrations.

Copestake, Ann. 2002. Implementing Typed Feature Structure Grammars. CSLI Publications, Stanford, CA.

Copestake, Ann., Dan. Flickinger, Carl. Pollard, and Ivan A. Sag. 2005. Minimal Recursion Semantics: An Introduction. Research on Language & Computation, 3(4):281-332.

Oepen, Stephan. 2001. [incr tsdb()] — Competence and performance laboratory. User manual. Technical report, Saarbrücken, Germany.

Pollard, Carl and Ivan A. Sag. 1994. Head-Driven Phrase Structure Grammar. The University of Chicago Press, Chicago, IL.

Acknowledgments

This material is based upon work supported by the National Science Foundation under Grant No. BCS-0644097. Additional support for Grammar Matrix development came from a gift to the Turing Center from the Utilika Foundation.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.


Last modified: Thu Apr 29 16:17:57 PDT 2010