A Bigdata Approach to Auditing Documents


  • Geun-Won Kim
  • Seong-Taek Park
  • Jinhwa Kim


Establishment and focus: The purpose of this study is to suggest an automatic document audit system using techniques in big data. In this paper, 200 documents on budget request are collected as test data. Text mining technique is used to analyze the documents. Major keywords regarding the requested budgets are induced. The documents are decided into training data and test data. A neural network, support vector machine, and regression analysis are used to test its own model. Finally, the performances of these three methods are compared to find the best model. The test confirms that techniques in big data can be applied to document auditing. This study can also be applied to similar problems such as lie detection and defect findings
System: This study suggests models for document auditing using techniques in big data such as text mining and data mining. A problem predicting costs of bill or budget is used as an example for a test problem. Documents containing cost of bill or budget are analyzed with techniques in text mining. Three data mining techniques such as neural network, logistic regression, and support vector machine are used to predict the output values as target values. The performance of these three methods are measured and compared. Among these three methods, support vector machine shows the best performance compared to other two methods of regression and neural network.