Paper

Paper Name    Fast document image comparison in multilingual corpus without OCR
Author    Yuping Lin, Yingyu Li,et al
Publication/Completion Time    2015-10-08
Magazine Name    Multimedia Systems
Vol   
Related articles   
Paper description    This paper proposes a method to compare document images in multilingual corpus, which is composed of character segmentation, feature extraction and similarity measure. In character segmentation, a top-down strategy is used. We apply projection and self-adaptive threshold to analyze the layout and then segment the text line by horizontal projection. Then, English, Chinese and Japanese are recognized by different methods based on the distribution and ratios of text line. Finally, character segmentation with different languages is done using different strategies. In feature extraction and similarity measure, four features are given for coarse measurement, and then a template is set up. Based on the templates, a fast template matching method based on coarse-to-fine strategy and bit memory is presented for precise matching. The experimental results demonstrate that our method can handle multilingual document images of different resolutions and font sizes with high precision and speed. DOI 10.1007/s00530-015-0484-3