Statistical Parsing of Morphologically Rich Languages

Workshop Programme

6th Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL 2015)

Co-located with IWPT 2015, July 23rd in Bilbao, Basque Country, Spain

Workshop Programme and Final Paper versions

Invited Speaker: Daniel Zeman (Charles-University Prague),

"From the Jungle to a Park: Harmonizing Annotations across Languages"

Registration and Venue

SPMRL is co-located with IWPT, and there is a single registration for both events. See the IWPT registration page for further information. You can also find more information on accommodation as well as the overall program and IWPT accepted papers.

Workshop Program (July 23rd, 2015)

Time	Speaker, title
10:55	Opening
11:00 - 11:20	Matthieu Constant and Joseph LeRoux Dependency Representations for Lexical Segmentation
11:20 - 11:40	Aitziber Atutxa, Nerea Ezeiza, Iakes Goenaga and Koldo Gojenola Experiments on Semi-supervised Dependency Parsing of a Morphologically Rich Language
11:40 – 12:00	Daniel Dakota, Timur Gilmanov, Wen Li, Christopher Kuzma, Evgeny Kim, Noor Abo Mokh and Sandra Kübler Do Free Word Order Languages Need More Treebank Data? Investigating Dative Alternation in German, English and Russian.
12:00 - 12:30	Aniruddha Tammewar, Karan Singla, Bhasha Agrawal, Riyaz Bhat and Dipti Misra Sharma Can Distributed Word Embeddings be an Alternative to Costly Linguistic Features: A Study on Parsing Hindi.
12:30 - 14:00 lunch break
14:00 - 15:30	Daniel Zeman (invited talk) From the Jungle to a Park: Harmonizing Annotations across Languages
15:30 - 16:00 coffee break
16:00 - 16:20	Angelika Kirilin and Yannick Versley What is hard in Universal Dependency Parsing?
16:20 - 17:00	Djamé Seddah (invited talk) The SPMRL 2013/2014 Shared Tasks
17:00 - 17:30	Discussion and closing

Invited talks:

Daniel Zeman (Charles-University Prague)

From the Jungle to a Park: Harmonizing Annotations Across Languages

In this talk I will describe my work towards universal representation of morphology and dependency syntax in treebanks of various languages. Not only is such harmonization advantageous for linguists-users of corpora, it is also a prerequisite for cross-language parser adaptation techniques such as delexicalized parsing. I will present Interset, an interlingua-like tool to translate morphosyntactic representations between tagsets; I will also show how the features from Interset are used in a recent framework called Universal Dependencies. Some experiments with delexicalized parsing on harmonized data will be presented. Finally, I will discuss the extent to which various morphological features are important in the context of statistical dependency parsing.

Djamé Seddah (Paris-Sorbonne)

Overview of the SPMRL Shared Tasks: 2 years later, where are we now

In this presentation, we will present the outcomes on the two shared tasks on statistical parsing of morphologically rich languages held in 2013 and 2014. The task features data sets from nine languages (Arabic, Basque, French, German, Hebrew, Hungarian, Korean, Polish and Swedish), each available both in constituency and dependency annotation. Large unlabeled data sets were also made available in different forms (tagged, parsed, with morph analysis), in the hope of boosting semi-supervised methods for MRL parsing.

We report on the preparation of the data sets, on the proposed parsing scenarios, and on the evaluation metrics for parsing MRLs given different representation types. We present and analyze parsing results obtained by the task participants, and then provide an analysis and comparison of the parsers across languages and frameworks, reported for gold input as well as more realistic parsing scenarios. Both shared tasks saw submissions from 20 teams. The parsing results were obtained in different input scenarios (gold, predicted, and raw) and evaluated using different protocols (cross-framework, cross-scenario, and cross-language). In particular, this was the first time a multilingual evaluation campaign reports on the execution of parsers in realistic, morphologically ambiguous, settings.

Interestingly, the SPRML data set has spread beyond its initial circle of interest and is now used as a common benchmark for constituent parsing as well as realistic dependency parsing evaluation.

(joint work with Reut Tsarfary, Sandra Kübler and many contributors)

Organizers

Marie Candito (Univ. Paris Diderot / Alpage, France)
Jinho D. Choi (Emory University, US)
Yannick Versley (Univ. Heidelberg, Germany)

Program committee

Miguel Ballesteros (Univ. Pompeu Fabra, Spain)
Bernd Bohnet (Google, Inc.)
Özlem Cetinoglu (IMS Stuttgart, Germany)
Grzegorz Chrupala (Univ. Tilburg, Netherlands)
Matthieu Constant (Univ. Marne La Vallée, France)
Benoît Crabbé (Univ. Paris Diderot, France)
Gülsen Eryigit (Istanbul Technical Univ., Turkey)
Richárd Farkas (Univ. Szeged, Hungary)
Jennifer Foster (Dublin City Univ, Ireland)
Yoav Goldberg (Bar Ilan Univ., Israel)
Spence Green (Stanford Univ., USA)
Samar Husain (IIT Delhi, India)
Sandra Kübler (Indiana Univ., USA)
Joseph Le Roux (Univ. Paris-Nord, France)
John Lee (City Univ. Hong Kong)
Wolfgang Maier (HHU Düsseldorf, Germany)
Yuval Marton (Microsoft, USA)
David McClosky (IBM Research, USA)
Joakim Nivre (Uppsala Univ, Sweden)
Kemal Oflazer (Carnegie Mellon Univ., USA)
Ines Rehbein (Univ. Potsdam, Germany)
Djamé Seddah (Univ. Paris-Sorbonne, France)
Wolfgang Seeker (IMS Stuttgart, Germany)
Anders Søgaard (Copenhagen Univ., Denmark)
Lamia Tounsi (Dublin City Univ., Ireland)
Reut Tsarfaty (Weizmann Institute, Israel)

Statistical Parsing of Morphologically Rich Languages

Other articles

Call for Papers

6th Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL 2015)

Outline