► FAST for feature subset selection
FAST is the implementation of the feature subset selection algorithm published in the TKDE journal, which can be used to effectivlely choose more useful features for high dimensional classification problems.
The corresponding paper is
Qinbao Song, Jingjie Ni, and Guangtao Wang: A Fast Clustering-Based Feature Subset Selection Algorithm for High Dimensional Data. IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 25, no. 1, pp 1-14, 2013.
FAST runs under WEKA, its Java package can be downloaded HERE, and the README file provides the details of how to use it.
► SplitBal & ClusterBal for classfying imbalance problems
The SplitBal and ClusterBal tools were developed in Java for dealing with the binary class-imbalance problems. The details of the underlying methods can be found in
Zhongbin Sun, Qinbao Song , Xiaoyan Zhu, Heli Sun, Baowen Xu and Yuming Zhou: A Novel Ensemble Method for Classifying Imbalanced Data, Pattern Recognition, vol. 48, No. 5, pp 1623-1637, 2015.
These two tools run under WEKA, the Java package and the readme file can be downloaded HERE.
► EM1vs1 for software defect prediction
The EM1vs1 tool was develpoed for software defect prediction, which can be viewed as a classification problem. Explicitly taking into account the class-imbalance characteristic of software defect data is its feature. The details can be found in
Zhongbin Sun, Qinbao Song, and Xiaoyan Zhu: Using Coding Based Ensemble Learning to Improve Software Defect Prediction, IEEE Transactions on Systems, Man, and Cybernetics (TSMC), vol. 42, no. 6, pp 1806 - 1817, 2012.
This tool was written in Java and runs under WEKA, the Java package and the readme file can be downloaded HERE.
► FOIL Rule Based Feature Subset Selection for High Dimensional Data
This feature subset selection software tool was written in Java and runs under WEKA, the Java package
and the manual can be downloaded HERE.
This tool is the implementation of the algorithm published in the Pattern Recognition journal:
Guangtao Wang, Qinbao Song, Baowen Xu and Yuming Zhou: Selecting Feature Subset for High
Dimensional Data via the Propositional FOIL Rules, Pattern Recognition, vol. 46, no. 1, pp 199-214, 2013.
► Data Set Characteristics Extraction Tool
Data set characteristics are used to characterize a data set, it can be used for many purposes, such as
classification algorithm recommendation.
This tool extracts the five different types of characteristics of a given data set with the methods presented
in the following papers:
Guangtao Wang, Qinbao Song, Xueying Zhang, and Kaiyuan Zhang: A Generic Multi-label Learning
Based Classification Algorithm Recommendation Method, ACM Transactions on Knowledge Discovery from
Data, vol. 9, no. 1, pp. 7:1-7:30, 2014.
Qinbao Song, Guangtao Wang and Chao Wang: Automatic Recommendation of Classification Algorithms
Based on Data Set Characteristics, Pattern Recognition, vol. 45, no. 7, pp 2672-2689, 2012.
This tool was written in Java and runs under WEKA, the Java package and the manual can be
downloaded HERE.
► Software Defect Association Mining Tool
The software defect association mining tool was written in Java and runs under WEKA, the Java
package and the manual can be downloaded HERE.
The method used by this tool is presented in the paper:
Qinbao Song, Martin Shepperd, Michelle Cartwright, and Carolyn Mair: Software Defect Association Mining
and Defect Correction Effort Prediction. IEEE Transactions on Software Engineering. vol. 32, no. 2, pp 69-82, 2006.
► T3
►A Guide to Survival in Science. Getting a Ph.D. is just the beginning of a scientific career, there are many important "life" skills to learn. Physicist Peter J. Feibelman lays out a rational path to a fulfilling long-term research career. His book "A Ph.D. Is Not Enough - A Guide to Survival in Science" is full of advice for anyone in academia at all levels. It is required to read this book for anyone thinking of applying to graduate school or entering the science job market.