Accurate Detection of Text Areas Using Fused Characteristics


  • Seok-Woo Jang
  • Sang-Hong Lee


Establishment and focus: Texts provide important information to convey the meaning of an image. Therefore, there is a need for accurately segmenting characters included in images as a prerequisite of character recognition.
System: This study introduces an approach of extracting text regions included in stereoscopic images based on texture and depth features. The method first segments candidate text areas using texture features. Then, after character region localization is performed, the background is separated from the localized character string. Finally, the depth feature is utilized to confirm whether the obtained text areas contain only the text regions and not the non-text regions. In the testing of this study, the introduced algorithm detected the character regions in input color images more accurately than the existing algorithm. To compare the performance of the introduced character region acquisition method, we used a correctness metric expressed as a percentage that shows the ratio of exactly localized strings that do not contain non-character areas to the total localized strings. For performance comparison, we also implemented the conventional neural network-based text detection method. In general, when there is an artificially inserted background, the background area is simple, so that the binarization of the text area is performed accurately. However, in the case of texts without an artificial background, it is difficult to accurately binarize the text and the background in the existing method, but the proposed method can obtain relatively good results by using texture and depth information simultaneously. The proposed method improves the accuracy of character area verification by greatly reducing the extraction of non-character regions with the help of three-dimensional depth information. The suggested text region extraction approach is expected to be very useful in computer vision related fields such as movie caption recognition, character recognition, license plate extraction, and so on.