Toward abstractive multi-document summarization using submodular function-based framework, sentence compression and merging

Thumbnail Image
Date
2016
Authors
Tanvee, Moin Mahmud
University of Lethbridge. Faculty of Arts and Science
Journal Title
Journal ISSN
Volume Title
Publisher
Lethbridge, Alta : University of Lethbridge, Dept. of Mathematics and Computer Science
Abstract
Automatic multi-document summarization is a process of generating a summary that contains the most important information from multiple documents. In this thesis, we design an automatic multi-document summarization system using different abstraction-based methods and submodularity. Our proposed model considers summarization as a budgeted submodular function maximization problem. The model integrates three important measures of a summary - namely importance, coverage, and non-redundancy, and we design a submodular function for each of them. In addition, we integrate sentence compression and sentence merging. When evaluated on the DUC 2004 data set, our generic summarizer has outperformed the state-of-the-art summarization systems in terms of ROUGE-1 recall and f1-measure. For query-focused summarization, we used the DUC 2007 data set where our system achieves statistically similar results to several well-established methods in terms of the ROUGE-2 measure.
Description
Keywords
automatic text summarization , abstraction-based , submodular function , generic-focused summarization , query-focused summarization , greedy algorithm , Natural language processing (Computer science) -- Research , Querying (Computer science) , Database searching , Parsing (Computer grammar) , Information retrieval , Question-answering systems -- Research , Computer science -- Mathematics
Citation