Authors
Clark, Alexander
Year
2000
Abstract
This paper addresses the issue of the automatic induction of syntactic categories from unanno- tared corpora. Previous techniques give good results, but fail to cope well with ambiguity or rare words. An algorithm, context distribution clustering (CDC), is presented which can be naturally extended to handle these problems.