Semi-extractive multi-document summarization

Ghiyafeh Davoodi, Fatemeh; University of Lethbridge. Faculty of Arts and Science

Semi-extractive multi-document summarization

Files

GHIYAFEH_DAVOODI_FATEMEH_MSC_2015.pdf(547.63 KB)

Date

2015

Authors

Ghiyafeh Davoodi, Fatemeh

University of Lethbridge. Faculty of Arts and Science

Publisher

Lethbridge, Alta. : University of Lethbridge, Dept. of Mathematics and Computer Science

Abstract

In this thesis, I design a Maximum Coverage problem with KnaPsack constraint (MCKP) based model for extractive multi-document summarization. The model integrates three measures to detect important sentences including Coverage, rewards sentences in regards to their representative level of the whole document, Relevance, focuses to select sentences that related to the given query, and Compression, rewards concise sentences. To generate a summary, I apply an efficient and scalable greedy algorithm. The algorithm has a near optimal solution when its scoring functions are monotone non-decreasing and submodular. I use DUC 2007 dataset to evaluate our proposed method. Investigating the results using ROUGE package shows improvement over two closely related works. The experimental results illustrates that integrating compression in the MCKP-based model, applying semantic similarity measures to detect Relevance measure and also defining all scoring functions as a monotone submodular function result in having a better performance in generating a summary.

Keywords

greedy algorithm , knapsack , maximum coverage , multi-document , summarization

URI

https://hdl.handle.net/10133/3759

Collections

Arts and Science, Faculty of
University of Lethbridge Theses

Full item page