Accession Number : ADA273556


Title :   Building a Large Annotated Corpus of English: The Penn Treebank


Descriptive Note : Final rept. 1 Nov 1989-30 Apr 1993


Corporate Author : MOORE SCHOOL OF ELECTRICAL ENGINEERING PHILADELPHIA PA DEPT OF COMPUTER AND INFORMATION SCIENCES


Personal Author(s) : Marcus, Mitch


Full Text : http://www.dtic.mil/dtic/tr/fulltext/u2/a273556.pdf


Report Date : 30 Apr 1993


Pagination or Media Count : 25


Abstract : As a result of this grant, the researchers have now published oil CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, with over 3 million words of that material assigned skeletal grammatical structure. This material now includes a fully hand-parsed version of the classic Brown corpus. About one half of the papers at the ACL Workshop on Using Large Text Corpora this past summer were based on the materials generated by this grant.


Descriptors :   *COMPUTATIONAL LINGUISTICS , *NATURAL LANGUAGE , MATERIALS , STRUCTURES , WORKSHOPS , GRANTS , HANDS , SUMMER , SPEECH , OILS


Subject Categories : Linguistics


Distribution Statement : APPROVED FOR PUBLIC RELEASE