学术论文

论文标题    Fast document image comparison in multilingual corpus without OCR
作者    Yuping Lin, Yingyu Li,et al
发表/完成日期    2015-10-08
期刊名称    Multimedia Systems
期卷   
相关文章   
论文简介    This paper proposes a method to compare document images in multilingual corpus, which is composed of character segmentation, feature extraction and similarity measure. In character segmentation, a top-down strategy is used. We apply projection and self-adaptive threshold to analyze the layout and then segment the text line by horizontal projection. Then, English, Chinese and Japanese are recognized by different methods based on the distribution and ratios of text line. Finally, character segmentation with different languages is done using different strategies. In feature extraction and similarity measure, four features are given for coarse measurement, and then a template is set up. Based on the templates, a fast template matching method based on coarse-to-fine strategy and bit memory is presented for precise matching. The experimental results demonstrate that our method can handle multilingual document images of different resolutions and font sizes with high precision and speed. DOI 10.1007/s00530-015-0484-3