Software Tools                                                      

 The following software tools were developed by the members of the EDEN research group. You can freely use them for the purpose of academic research but with no commitment in terms of support or mantenance. 


 ►   FAST for feature subset selection


FAST is the implementation of the feature subset selection algorithm published in the TKDE journal, which can be used to effectivlely choose more useful features for high dimensional classification problems.


The corresponding paper is

Qinbao Song, Jingjie Ni, and Guangtao Wang: A Fast Clustering-Based Feature Subset Selection Algorithm for High Dimensional Data. IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 25, no. 1, pp 1-14, 2013.


FAST runs under WEKA, its Java package can be downloaded HERE, and the README file provides the details of how to use it.


  SplitBal & ClusterBal for classfying imbalance problems 


The SplitBal and ClusterBal tools were developed in Java for dealing with the binary class-imbalance problems. The details of the underlying methods can be found in


Zhongbin Sun, Qinbao Song , Xiaoyan Zhu, Heli Sun, Baowen Xu and Yuming Zhou: A Novel Ensemble Method for Classifying Imbalanced DataPattern Recognition, vol. 48, No. 5,  pp 1623-1637, 2015.  

These two tools  run under WEKA, the Java package and the readme file can be downloaded HERE.



EM1vs1 for software defect prediction


The EM1vs1 tool was develpoed for software defect prediction, which can be viewed as a classification problem. Explicitly taking into account the class-imbalance characteristic of software defect data is its feature.  The details can be found in


Zhongbin Sun, Qinbao Song, and Xiaoyan Zhu: Using Coding Based Ensemble Learning to Improve Software Defect Prediction,  IEEE Transactions on Systems, Man, and Cybernetics (TSMC),  vol. 42, no. 6, pp 1806 - 1817, 2012. 


This tool was written in Java and runs under WEKA, the Java package and the readme file can be downloaded HERE.


► FOIL Rule Based Feature Subset Selection for High Dimensional Data


        This feature subset selection software tool was written in Java and runs under WEKA, the Java package

        and the manual can be downloaded HERE.


        This tool is the implementation of the algorithm published in the Pattern  Recognition journal:


        Guangtao Wang, Qinbao Song, Baowen Xu and Yuming Zhou:  Selecting Feature Subset for High

        Dimensional Data via the Propositional FOIL Rules, Pattern  Recognitionvol. 46, no. 1, pp 199-2142013.


  Data Set Characteristics Extraction Tool


       Data set characteristics are used to characterize a  data set, it can be used for many purposes, such as   

       classification algorithm recommendation. 


       This tool extracts the five different types of characteristics of a given data set with the methods presented

       in the following papers: 


         Guangtao Wang, Qinbao Song, Xueying Zhang, and Kaiyuan Zhang:  A Generic Multi-label Learning

       Based Classification Algorithm Recommendation Method, ACM Transactions on Knowledge Discovery from

        Data,  vol. 9, no. 1, pp. 7:1-7:30, 2014.


        Qinbao Song, Guangtao Wang and Chao Wang:  Automatic Recommendation of Classification Algorithms

        Based on Data Set Characteristics, Pattern Recognition, vol. 45, no. 7, pp 2672-2689, 2012.  


       This tool was written in Java and runs under WEKA, the Java package and the manual can be

       downloaded HERE.


  Software Defect Association Mining Tool


       The software defect association mining tool was written in Java and runs under WEKA, the Java

       package and the manual can be downloaded HERE.


         The method used by this tool is presented in the paper:  


       Qinbao Song, Martin Shepperd, Michelle Cartwright, and Carolyn Mair:  Software Defect Association Mining

       and Defect Correction Effort Prediction. IEEE Transactions on Software Engineering. vol. 32, no. 2, pp 69-82, 2006.       






A Guide to Survival in Science. Getting a Ph.D. is just the beginning of a scientific career, there are many important "life" skills to learn. Physicist Peter J. Feibelman lays out a rational path to a fulfilling long-term research career.  His book "A Ph.D. Is Not Enough - A Guide to Survival in Science" is full of advice for anyone in academia at all levels. It is required to read this book for anyone thinking of applying to graduate school or entering the science job market.