Viterbi Based Document Classification with Long Short Term Memory
Nowadays, the number of documents published are widely increasing, for the efficient retrieval of the document, classification of the document based on the similarity to the particular domain needs to be done and then the ranking of papers also need to be done, which results in the effective retrieval of the document. This document classification is very helpful in the information retrieval system, which reduces the time of retrieval. There are different methods for classification, like Naive Bayes Classifier, Support Vector Machines, Artificial Neural Network, Latent Semantic Indexing, K-Nearest Neighbor Algorithm, etc. In this paper we use Latent Semantic Indexing (LSI)is one of the best approaches for classification of the document which uses the mathematical method called Singular Value Decomposition(SVD) for classification of documents. The Parts of Speech (POS) tagging are also used in these types of classification which helps to improve the performance of the system. We use the Viterbi algorithm along with Long Short Term Memory(LSTM) to find the POS tags for each word in the document. The LSTM and bidirectional LSTM is also compared by combining different POS tag identification algorithm and the best method is used in this architecture of theproposed system of classification.