科研 - 李颖玉 - 教师个人主页

学术论文

论文标题	Multilingual corpus construction based on printed and handwritten character separation
作者	Yuping Lin, Yonghong Song, Yingyu Li, et al
发表/完成日期	2015-10-24
期刊名称	Multimedia Tools & Applications
期卷
相关文章
论文简介	This paper proposes an effective method to extract printed and handwritten characters from multilingual document images to build corpus. To extract the characters from the document images, a connected component analysis method is used to remove the graphics. After that, multiple types of features and AdaBoost algorithm are introduced to classify printed and handwritten characters in a more versatile and robust way. Firstly, the content of the image is divided into several text patches which are then used to distinguish different languages. Secondly, we use the multiple types of features and AdaBoost algorithm to train the classifiers based on the segmented patches. Finally, we can separate printed and handwritten parts of new image set by the trained classifiers. The proposed method improves the precision of the extraction of written materials in text images of different languages. Experimental results demonstrate that the proposed method is more accurate in terms of precision and recall rate compared with the state-of the-art methods. DOI 10.1007/s11042-015-2995-5