Regina Barzilay
Computer Science
MIT
Friday, March 24, 12-2 p.m.

Learning to Model Text Structure
Discourse models capture relations across different sentences in a document.  These models are crucial in applications where it is important to generate coherent text. To date, rule-based approaches have been predominant in discourse research. However, these models are hard to incorporate as-is in modern systems: they rely on handcrafted rules, valid only for limited domains, with no guarantee of scalability or portability.

In this talk, I will present discourse models that can be effectively learned from a collection of unannotated texts. The key premise of our work is that the distribution of entities in coherent texts exhibits certain regularities. The models I will be presenting operate over an automatically-computed representation that reflects distributional, syntactic, and referential information about discourse entities. This representation allows us to induce the properties of coherent texts from a given corpus, without recourse to manual annotation or a predefined knowledge base. To conclude my talk, I will show how these models can be effectively integrated in statistical generation and summarization systems.

This is joint work with Mirella Lapata and Lillian Lee.