Multi-Document Abstractive Text Summarization through Semantic Similarity Matrix for Telugu Language


  • D Naga Sudha
  • Y Madhavee Latha


Telugu is one of the popular south Indian languages which is currently spoken by 84 million population in Telangana and Andhra Pradesh. Text summarization is an area of research with a goal to provide short text from huge text documents. Extractive text summarization methods have been extensively studied by many researchers. There are various type of multi document ranging from different formats to domains and topic specific. With the application of neural networks for text generation, interest for research in abstractive text summarization has increased significantly. This approach has been attempted for Telugu language in this article. Recurrent neural networks are a subtype of recursive neural networks which try to predict the next sequence based on the current state and considering the information from previous states. The use of neural networks allows generation of summaries for long text sentences as well.  The work implements semantic based filtering using a similarity matrix while keeping all stop-words. The similarity is calculated using semantic concepts and Jiang Similarity and making use of a Recurrent Neural Network (RNN) with an attention mechanism to generate summary. ROUGE score is used for measuring the performance of the applied method on Telugu Language.