Data Preprocessing for Anomaly Detection
Abstract
The security of data is challenging in the business sector due to its availability in cyberspace. Our data is most valuable, and it is the asset of an organization. Insider threats can be detected based on the anomalous behavior of inside users. There is a need to divide the data into two parts normal data and abnormal data. Therefore, it is required to find out the specific features based on which the researcher can train the dataset, perform analysis, and conclude that this converted into potential cyberattacks. A cyberattack may leak or damage the data, data theft, data sharing with the externals. These incidences may cause a considerable loss, spoil the image, or creditability; it may close the organization forever. This research paper proposed the data preprocessing process used for insider threat detection, which based on user behavior. It includes a survey of existing data sources, data quality, selection of the datasets for insider threats detection, data cleaning, feature extraction, and check data relevancy for further implementation. Data preprocessing is useful for the research to get accuracy and consistency in the result during implementation.