Geoffrey Sampson


[LOGO]

English for the Computer:
The SUSANNE Corpus and analytic scheme

by Geoffrey Sampson

In the past, progress in language engineering has been hampered by the fact that every research group working on English has its own way of describing the structure of the language. None of these traditions has been defined explicitly enough to allow data to be exchanged between sites while preserving its meaning.

English for the Computer is a reference book which for the first time offers an explicit, comprehensive scheme for representing the structural properties of real-life samples of the language. The definitions are intended to be sufficiently rigorous that two analysts who annotate the same example independently should normally produce identical output.

The SUSANNE project from which this book emerged also produced a 130,000-word electronic sample of English, the SUSANNE Corpus, annotated in conformity to the scheme; this resource is freely available for downloading by anonymous ftp (see my Resources page).

Some critical comment:

[Sampson’s] book aims to give linguistics what Linnaeus gave biology: a systematic, thorough-going naming of parts, in which everything encountered is classified. Early evidence is that Sampson’s book and associated computer files are indeed meeting a need.
Times Higher Education Supplement

the detail ... is unrivalled ... a very useful resource for the linguistic research community, and the SUSANNE Team and the funding agencies which have supported it richly deserve our thanks.
— D. Terence Langendoen (President, Linguistic Society of America) in Language

this impressive monograph ... covers every aspect of annotation
International Journal of Corpus Linguistics

ix + 499 pp.

Published by Clarendon Press (the scholarly imprint of Oxford University Press), 1995.

ISBN 0-19-824023-6




Geoffrey Sampson

last changed 5 Jan 2005