Beyond {Chomsky} Normal Form: Extending Strong Learning Algorithms for {PCFGs}

Authors Clark, Alexander Year 2021 Abstract We extend a recent consistent strong learning algorithm for a subclass of probabilistic context-free grammars in Chomsky normal form, (Clark and Fijalkow, 2020) to a much larger class of grammars by weakening the normal form constraints to allow for a richer class of derivation structures that are not necessarily binary branching, including among other possibilities unary branching trees which introduce some technical difficulties. By modifying the structural conditions appropriately we obtain an algorithm which is computationally efficient, and a consistent estimator for the class of grammars defined. [Read More]

Strong Learning of Probabilistic Tree Adjoining Grammars

Authors Alexander Clark Year 2021 Abstract The problem of learning syntactic structure is notoriously difficult, especially with mildly context-sensitive grammars. Existing approaches to learning these types of grammars are limited – they are not guaranteed to learn the correct grammars or structures, only the correct derived structures whether these are trees or strings. Here we present some progress towards strong learning of these formalisms, extending a recent result on strong learning of probabilistic context-free grammars to a class of probabilistic context-free tree grammars that is weakly equivalent to Tree-Adjoining Grammars. [Read More]

Strong learning of some Probabilistic Multiple Context-Free Grammars

Authors Alexander Clark Year 2021 Abstract This paper presents an algorithm for strong learning of probabilistic multiple context free grammars from a positive sample of strings generated by the grammars. The algorithm is shown to be a consistent estimator for a class of well-nested grammars, given by explicit structural conditions on the underlying grammar, and for grammars in this class is guaranteed to converge to a grammar which is isomorphic to the original, not just one that generates the same set of strings. [Read More]

Consistent Unsupervised Estimators for Anchored {PCFG}s

Authors Clark, Alexander and Fijalkow, Nathana"el Year 2020 Abstract Learning probabilistic context-free grammars (PCFGs) from strings is a classic problem in computational linguistics since Horning (1969). Here we present an algorithm based on distributional learning that is a consistent estimator for a large class of PCFGs that satisfy certain natural conditions including being anchored (Stratos et al., 2016). We proceed via a reparameterization of (top-down) PCFGs that we call a bottom-up weighted context-free grammar. [Read More]