Abstract:
Faulty software have expensive consequences. To mitigate these cousequences, soft
ware developers have to identify and fix faulty software components before releasing
software products. Similarly, users have to gauge the delivered quality of software be
fore adoption of the software. However the abstract nature of software and the multiple
dimensions of software quality impede organizations from measuring software quality.
Software quality metrics can be used as proxies of software quality to ease the com
plexity of measuring reliability dimension of software quality. Previous studies have
suggested that software process metrics are better predictors of software faults as com
pared to software product metrics. However, there is need for a specific software process
metric that can guarantee consistent superior fault prediction performances across dif
ferent contexts. This research sought to determine a predictor for software faults that
has the best prediction performance, requires least effort to detect software faults, and
has a minimum cost of misclassifyiug components. In addition, the study investigated
the effect of combining predictors on the performance of software fault prediction mo
dels. Experimental data sets for this study were derived from four heterogeneous Open
Source Software projects. Logistic Regression algorithm was used to predict bug status
of each file, while Linear Regression algorithm was used to predict number of bugs per
file. Prediction performance of the models built with software metrics as predictors was
evaluated against numerical model performance measures, effort of prediction, and cost
of misclassification of components. Models built. with Change Durst metrics registered
overall best performance as compared to those built with Chango, Cede Churn, Deve
loper Networks and Source Code software metrics. Change Durst metrics recorded the
highest values for numerical performance measures, exhibited the highest fault detec
tion probabilities ranging between 68% to 55% upon examination of only 20% of source
code, and had the least cost of misclassification of components. Combining software me
trics was found not to significantly improve performance of software faults prediction
models. The study concluded that the Change Burst metrics model could effectively pre
dict software faults. Random Forest's IncNodePurity revealed that six Change Durst
metrics were most influential in predicting software faults. This study recommended
that the six Change Durst metrics should be used when predicting software faults.