Introduction
Proposition Bank I was produced by Linguistic Data
Consortium (LDC) catalog number LDC2004T14 and ISBN 1-58563-304-6.
This is a semantic annotation of the Wall Street Journal section of
Treebank-2. More specifically, each verb occurring in the
Treebank has been treated as a semantic predicate and the surrounding
text has been annotated for arguments and adjuncts of the predicate.
The verbs have also been tagged with coarse grained senses and with
inflectional information. This work was done in the Computer and
Information Sciences Department at the University of Pennsylvania.
All data is the result of double blind, adjudicated annotation.
Data
There are two basic components to Propbank:
- The Verb Lexicon. A frames file, consisting of one or more frame sets, has been created for
each verb occuring in the Treebank. These files serve as a reference for the annotators and
for users of the data. 3,324 such files have been created, totalling about 5.5 MB of uncompressed data.
- The Annotation. There are approximately 113,000 annotated verb tokens. These verb tokens include
all those occurring in over one million words of the Wall Street Journal section of the Penn Treebank, excluding
'be' and auxiliary uses of 'do' and 'have.' There are annotations for over 3,200 unique verbs. These annotations
are stored in a single file in standoff format, totalling ~9.6 MB of uncompressed data.
Updates
Please check the Propbank homepage for updates, tools, annotation guidelines, and published papers.
Sponsorship
This work was funded by DoD Grant MDA904-00C-2136, NSF grant IIS-9800658,
and the Institute for Research in Cognitive Science at the University of
Pennsylvania NSF-STC grant SBR-89-20230.
Content Copyright
Portions © 2004 Trustees of the University of Pennsylvania |