Unsupervised Multi-document Summarization with Holistic Inference

Abstract

Multi-document summarization aims to obtain core information from a collection of documents written on the same topic. This paper proposes a new holistic framework for unsupervised multi-document extractive summarization. Our method incorporates the holistic beam search inference method associated with the holistic measurements, named Subset Representative Index (SRI). SRI balances the importance and diversity of a subset of sentences from the source documents and can be calculated in unsupervised and adaptive manners. To demonstrate the effectiveness of our method, we conduct extensive experiments on both small and large-scale multi-document summarization datasets under both unsupervised and adaptive settings. The proposed method outperforms strong baselines by a significant margin, as indicated by the resulting ROUGE scores and diversity measures. Our findings also suggest that diversity is essential for improving multi-document summary performance.

Publication
In Findings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Kaiqiang Song
Kaiqiang Song
Senior Research Scientist

Kaiqiang Song (宋凯强) is a Senior Research Scientist at Tencent AI Lab, Seattle, specializing in Natural Language Processing. His research focuses on advancing artificial intelligence through machine learning, NLP, and large language models. He is dedicated to optimizing AI model architectures for practical applications like text summarization and text generation, bridging the gap between foundational AI research and real-world impact.