Strong Learning of Probabilistic Tree Adjoining Grammars

Authors Alexander Clark Year 2021 Abstract The problem of learning syntactic structure is notoriously difficult, especially with mildly context-sensitive grammars. Existing approaches to learning these types of grammars are limited – they are not guaranteed to learn the correct grammars or structures, only the correct derived structures whether these are trees or strings. Here we present some progress towards strong learning of these formalisms, extending a recent result on strong learning of probabilistic context-free grammars to a class of probabilistic context-free tree grammars that is weakly equivalent to Tree-Adjoining Grammars. [Read More]

Strong learning of some Probabilistic Multiple Context-Free Grammars

Authors Alexander Clark Year 2021 Abstract This paper presents an algorithm for strong learning of probabilistic multiple context free grammars from a positive sample of strings generated by the grammars. The algorithm is shown to be a consistent estimator for a class of well-nested grammars, given by explicit structural conditions on the underlying grammar, and for grammars in this class is guaranteed to converge to a grammar which is isomorphic to the original, not just one that generates the same set of strings. [Read More]

Consistent Unsupervised Estimators for Anchored {PCFG}s

Authors Clark, Alexander and Fijalkow, Nathana"el Year 2020 Abstract Learning probabilistic context-free grammars (PCFGs) from strings is a classic problem in computational linguistics since Horning (1969). Here we present an algorithm based on distributional learning that is a consistent estimator for a large class of PCFGs that satisfy certain natural conditions including being anchored (Stratos et al., 2016). We proceed via a reparameterization of (top-down) PCFGs that we call a bottom-up weighted context-free grammar. [Read More]

Distributional Learning of Context-Free and Multiple Context-Free Grammars

Authors Clark, Alexander and Yoshinaka, Ryo Year 2016 Abstract Natural languages require grammars beyond context-free for their description. Here we extend a family of distributional learning algorithms for context-free grammars to the class of Parallel Multiple Context-Free Grammars (PMCFGs). These grammars have two additional operations beyond the simple context-free operation of concatenation: the ability to interleave strings of symbols, and the ability to copy or duplicate strings. This allows the grammars to generate some non-semilinear languages, which are outside the class of mildly context-sensitive grammars. [Read More]

Distributional Learning of Some Nonlinear Tree Grammars

Authors Clark, Alexander and Kanazawa, Makoto and Kobele, Gregory M. and Yoshinaka, Ryo Year 2016 Abstract A key component of Clark and Yoshinaka’s distributional learning algorithms is the extraction of substructures and contexts contained in the input data. This problem often becomes intractable with nonlinear grammar formalisms due to the fact that more than polynomially many substructures and/or contexts may be contained in each object. Previous works on distributional learning of nonlinear grammars avoided this difficulty by restricting the substructures or contexts that are made available to the learner. [Read More]

An introduction to multiple context free grammars for linguists

Authors

Clark, Alexander

Year

2014

Abstract

This is a gentle introduction to Multiple Context Free Grammars (mcfgs), intended for linguists who are familiar with context free grammars and movement based analyses of displaced constituents, but unfamiliar with Minimalist Grammars or other mildly context-sensitive formalisms.

link

local copy of pdf

citation in bibtex