Logical grammars, logical theories

Authors Clark, Alexander Year 2012 Abstract Residuated lattices form one of the theoretical backbones of the Lambek Calculus as the standard free models. They also appear in grammatical inference as the syntactic concept lattice, an algebraic structure canonically defined for every language L based on the lattice of all distributionally definable subsets of strings. Recent results show that it is possible to build representations, such as context-free grammars, based on these lattices, and that these representations will be efficiently learnable using distributional learning. [Read More]

A Language Theoretic Approach to Syntactic Structure

Authors Clark, Alexander Year 2011 Abstract We consider the idea of defining syntactic structure relative to a language, rather than to a grammar for a language. This allows us to define a notion of hierarchical structure that is independent of the particular grammar, and that depends rather on the properties of various algebraic structures canonically associated with a language. Our goal is not necessarily to recover the traditional ideas of syntactic structure invented by linguists, but rather to come up with an objective notion of syntactic structure that can be used for semantic interpretation. [Read More]

Inference of Inversion Transduction Grammars

Authors Alexander Clark Year 2011 Abstract We present the first polynomial algorithm for learning a class of inversion transduction grammars (ITGs) that implement context free transducers – functions from strings to strings. The class of transductions that we can learn properly includes all subsequential transductions. These algorithms are based on a generalisation of distributional learning; we prove correctness of our algorithm under an identification in the limit model. Links link [Read More]

{Linguistic Nativism and the Poverty of the Stimulus}

Authors Alexander Clark and Shalom Lappin Year 2011 Abstract This unique contribution to the ongoing discussion of language acquisition considers the Argument from the Poverty of the Stimulus in language learning in the context of the wider debate over cognitive, computational, and linguistic issues. Critically examines the Argument from the Poverty of the Stimulus - the theory that the linguistic input which children receive is insufficient to explain the rich and rapid development of their knowledge of their first language(s) through general learning mechanisms Focuses on formal learnability properties of the class of natural languages, considered from the perspective of several learning theoretic models The only current book length study of arguments for the poverty of the stimulus which focuses on the computational learning theoretic aspects of the problem. [Read More]

Distributional Learning of some Context-free Languages with a Minimally Adequate Teacher

Authors Alexander Clark Year 2010 Abstract Angluin showed that the class of regular languages could be learned from a Minimally Adequate Teacher (MAT) providing membership and equivalence queries. Clark and Eyraud (2007) showed that some context free grammars can be identified in the limit from positive data alone by identifying the congruence classes of the language. In this paper we consider learnability of context free languages using a MAT. We show that there is a natural class of context free languages, that includes the class of regular languages, that can be polynomially learned from a MAT, using an algorithm that is an extension of Angluin’s LSTAR algorithm. [Read More]

Learning Context Free Grammars with the Syntactic Concept Lattice

Authors Clark, Alexander Year 2010 Abstract The Syntactic Concept Lattice is a residuated lattice based on the distributional structure of a language; the natural representation based on this is a context sensitive formalism. Here we examine the possibility of basing a context free grammar (cfg) on the structure of this lattice; in particular by choosing non-terminals to correspond to concepts in this lattice. We present a learning algorithm for context free grammars which uses positive data and membership queries, and prove its correctness under the identification in the limit paradigm. [Read More]

The Handbook of Computational Linguistics and Natural Language Processing

Editors Alexander Clark and Chris Fox and Shalom Lappin Year 2010 Abstract This comprehensive reference work provides an overview of the concepts, methodologies, and applications in computational linguistics and natural language processing (NLP). Features contributions by the top researchers in the field, reflecting the work that is driving the discipline forward Includes an introduction to the major theoretical issues in these fields, as well as the central engineering applications that the work has produced Presents the major developments in an accessible way, explaining the close connection between scientific understanding of the computational properties of natural language and the creation of effective language technologies Serves as an invaluable state-of-the-art reference source for computational linguists and software engineers developing NLP applications in industrial research and development labs of software companies. [Read More]

Three Learnable Models for the Description of Language

Authors Clark, Alexander Year 2010 Abstract Learnability is a vital property of formal grammars: representation classes should be defined in such a way that they are learnable. One way to build learnable representations is by making them objective or empiricist: the structure of the representation should be based on the structure of the language. Rather than defining a function from representation to language we should start by defining a function from the language to the representation: following this strategy gives classes of representations that are easy to learn. [Read More]

Unsupervised learning and grammar induction

Authors Clark, Alexander and Lappin, Shalom Year 2010 Abstract In this chapter we consider unsupervised learning from two perspectives. First, we briefly look at its advantages and disadvantages as an engineering technique applied to large corpora in natural language processing. While supervised learning generally achieves greater accuracy with less data, unsupervised learning offers significant savings in the intensive labor required for annotating text. Second, we discuss the possible relevance of unsupervised learning to debates on the cognitive basis of human language acquisition. [Read More]

Another Look at Indirect Negative Evidence

Authors Clark, Alexander and Lappin, Shalom Year 2009 Abstract Indirect negative evidence is clearly an important way for learners to constrain overgeneralisation, and yet a good learning theoretic analysis has yet to be provided for this, whether in a PAC or a probabilistic identification in the limit framework. In this paper we suggest a theoretical analysis of indirect negative evidence that allows the presence of ungrammatical strings in the input and also accounts for the relationship between grammaticality/acceptability and probability. [Read More]