29 References
Abran, A., and P. N. Robillard. 1996. “Function Points Analysis:
An Empirical Study of Its Measurement Processes.” Software
Engineering, IEEE Transactions on 22 (12): 895–910. https://doi.org/10.1109/32.553638.
Afzal, Wasif, Richard Torkar, and Robert Feldt. 2012. “Resampling
Methods in Software Quality Classification.” International
Journal of Software Engineering and Knowledge Engineering 22:
203–23.
Aguilar-Ruiz, Jesés S., Isabel Ramos, José C. Riquelme, and Miguel Toro.
2001. “An Evolutionary Approach to Estimating Software Development
Projects.” Information and Software Technology 43 (14):
875–82. https://doi.org/http://dx.doi.org/10.1016/S0950-5849(01)00193-8.
Aguilar-Ruiz, Jesús S., José Cristóbal Riquelme, and Isabel Ramos. 2002.
“Natural Evolutionary Coding: An Application to Estimating
Software Development Projects.” Proceedings of the 2002
Conference on Genetic and Evolutionary Computation (GECCO ’02) (New
York, USA), 1–8.
Aha, D. W., D. Kibler, and M. K. Albert. 1991. “Instance-Based
Learning Algorithms.” Machine Learning 6: 37–66.
Albrecht, A. J., and Jr. Gaffney J. E. 1983. “Software Function,
Source Lines of Code, and Development Effort Prediction: A Software
Science Validation.” IEEE Transactions on Software
Engineering 9 (6): 639–48. https://doi.org/10.1109/TSE.1983.235271.
Alcalá-Fdez, J., A. Fernandez, J. Luengo, S. Garcáa J. Derrac, L.
Sánchez, and F. Herrera. 2011. “KEEL Data-Mining
Software Tool: Data Set Repository, Integration of Algorithms and
Experimental Analysis Framework.” Journal of Multiple-Valued
Logic and Soft Computing 17 (2-3): 255–87.
Alcalá-Fdez, J., L. Sánchez, S. García, et al. 2009.
“KEEL: A Software Tool to Assess Evolutionary
Algorithms to Data Mining Problems.” Soft Computing 13
(3): 307–18.
Ali, Shaukat, Lionel C. Briand, Hadi Hemmati, and Rajwinder Kaur
Panesar-Walawege. 2010. “A Systematic Review of the Application
and Empirical Investigation of Search-Based Test-Case
Generation.” IEEE Transactions on Software Engineering
(Los Alamitos, CA, USA) 36 (6): 742–62. https://doi.org/http://dx.doi.org/10.1109/TSE.2009.52.
Allaire, JJ, Joe Cheng, Yihui Xie, et al. 2016. Rmarkdown: Dynamic
Documents for r. https://CRAN.R-project.org/package=rmarkdown.
Alves, T. L., C.Ypma, and J. Visser. 2010. “Deriving Metric
Thresholds from Benchmark Data.” IEEE International
Conference on Software Maintenance (ICSM’2010), 1–10. https://doi.org/10.1109/ICSM.2010.5609747.
Arisholm, Erik, Lionel C. Briand, and Eivind B. Johannessen. 2010.
“A Systematic and Comprehensive Investigation of Methods to Build
and Evaluate Fault Prediction Models.” Journal of Systems and
Software (New York, NY, USA) 83 (1): 2–17. https://doi.org/http://dx.doi.org/10.1016/j.jss.2009.06.055.
Bagnall, A. J., V. J. Rayward-Smith, and I. M. Whittley. 2001.
“The Next Release Problem.” Information and Software
Technology 43 (14): 883–90. https://doi.org/http://dx.doi.org/10.1016/S0950-5849(01)00194-X.
Bailey, John W., and Victor R. Basili. 1981. “A Meta-Model for
Software Development Resource Expenditures.” Proceedings of
the 5th International Conference on Software Engineering (ICSE’81)
(Piscataway, NJ, USA), ICSE’81, 107–16. http://dl.acm.org/citation.cfm?id=800078.802522.
Banker, Rajiv D, Hsihui Chang, and Chris F Kemerer. 1994.
“Evidence on Economies of Scale in Software Development.”
Information and Software Technology 36 (5): 275–82. https://doi.org/10.1016/0950-5849(94)90083-3.
Bansiya, J., and C. G. Davis. 2002. “A Hierarchical Model for
Object-Oriented Design Quality Assessment.” IEEE Transactions
on Software Engineering 28 (1): 4–17. https://doi.org/10.1109/32.979986.
Belady, L. A., and M. M. Lehman. 1979. Research Directions in
Software Technology. MIT Press.
Benlarbi, S., K. El Emam, N. Goel, and S. Rai. 2000. “Thresholds
for Object-Oriented Measures.” Proceedings 11th International
Symposium on Software Reliability Engineering (ISSRE 2000), 24–38.
https://doi.org/10.1109/ISSRE.2000.885858.
Blum, A. L., and P. Langley. 1997. “Selection of Relevant Features
and Examples in Machine Learning.” Artificial
Intelligence 97 (1-2): 245–71.
Boehm, Barry W. 1981. Software Engineering Economics. 1st ed.
Prentice Hall PTR.
Boetticher, G., T. Menzies, and T. Ostrand. 2007. PROMISE Repository
of Empirical Software Engineering Data. West Virginia University,
Department of Computer Science. http://promisedata.org/
repository.
Bowes, David, Tracy Hall, and Jean Petrić. 2015. “Different
Classifiers Find Different Defects Although with Different Level of
Consistency.” Proceedings of the 11th International
Conference on Predictive Models and Data Analytics in Software
Engineering (PROMISE’15 (New York, NY, USA), PROMISE’15, 3:1–10. https://doi.org/10.1145/2810146.2810149.
Breiman, Leo. 1996. “Bagging Predictors.” Machine
Learning 24: 123–40. https://doi.org/10.1007/BF00058655.
Breiman, Leo, Jerome H. Friedman, Richard A. Olshen, and Charles J.
Stone. 1984. Classification and Regression Trees. Wadsworth
International Group.
Caglayan, Bora, Ayse Bener, and Stefan Koch. 2009. “Merits of
Using Repository Metrics in Defect Prediction for Open Source
Projects.” Proceedings of the 2009 ICSE Workshop on Emerging
Trends in Free/Libre/Open Source Software Research and Development
(Washington, DC, USA), FLOSS’09, 31–36. https://doi.org/10.1109/FLOSS.2009.5071357.
Canfora, G., A. De Lucia, M. Di Penta, R. Oliveto, A. Panichella, and S.
Panichella. 2013. “Multi-Objective Cross-Project Defect
Prediction.” IEEE Sixth International Conference on Software
Testing, Verification and Validation, 252–61. https://doi.org/10.1109/ICST.2013.38.
Cano, José Ramón, Francisco Herrera, and Manuel Lozano. 2007.
“Evolutionary Stratified Training Set Selection for Extracting
Classification Rules with Trade Off Precision-Interpretability.”
Data & Knowledge Engineering 60 (1): 90–108. https://doi.org/10.1016/j.datak.2006.01.008.
Catal, Cagatay. 2011. “Software Fault Prediction: A Literature
Review and Current Trends.” Expert Systems with
Applications 38 (4): 4626–36. https://doi.org/10.1016/j.eswa.2010.10.024.
Catal, Cagatay. 2012. “Software Mining and Fault
Prediction.” Wiley Interdisciplinary Reviews: Data Mining and
Knowledge Discovery 2 (5): 420–26. https://doi.org/10.1002/widm.1067.
Catal, Cagatay, and Banu Diri. 2009. “A Systematic Review of
Software Fault Prediction Studies.” Expert Systems with
Applications 36 (4): 7346–54. https://doi.org/10.1016/j.eswa.2008.10.027.
Chawla, N. V., K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. 2002.
“SMOTE: Synthetic Minority over-Sampling
TEchnique.” Journal of Artificial Intelligence Research
16: 321–57.
Chawla, N. V., A. Lazarevic, L. O. Hall, and K. W. Bowyer. 2003.
“SMOTEBoost: Improving Prediction of the Minority Class in
Boosting.” 7th European Conference on Principles and Practice
of Knowledge Discovery in Databases(PKDD 2003),
107–19.
Chawla, Nitesh V., Kevin W. Bowyer, Lawrence O. Hall, and W. Philip
Kegelmeyer. 2002. “SMOTE: Synthetic Minority over-Sampling
Technique.” J. Artif. Intell. Res. (JAIR) 16: 321–57.
Chen, Xiaojun, Yunming Ye, Graham Williams, and Xiaofei Xu. 2007.
“A Survey of Open Source Data Mining Systems.” In
Emerging Technologies in Knowledge Discovery and Data Mining,
edited by Takashi Washio, Zhi-Hua Zhou, JoshuaZhexue Huang, et al., vol.
4819. Lecture Notes in Computer Science. Springer Berlin Heidelberg.
Chen, Z., T. Menzies, D. Port, and D. Boehm. 2005. “Finding the
Right Data for Software Cost Modeling.” IEEE Software 22
(6): 38–46. https://doi.org/10.1109/MS.2005.151.
Chidamber, S. R., and C. F. Kemerer. 1994. “A Metrics Suite for
Object Oriented Design.” IEEE Transactions on Software
Engineering 20 (6): 476–93. https://doi.org/10.1109/32.295895.
Clauset, Aaron, Cosma Rohilla Shalizi, and M. E. J. Newman. 2007.
Power-Law Distributions in Empirical Data. http://www.citebase.org/abstract?id=oai:arXiv.org:0706.1062.
Concas, G., M. Marchesi, A. Murgia, R. Tonelli, and I. Turnu. 2011.
“On the Distribution of Bugs in the Eclipse System.”
IEEE Transactions on Software Engineering 37 (6): 872–77. https://doi.org/10.1109/TSE.2011.54.
D’Ambros, Marco, Michele Lanza, and Romain Robbes. 2010. “An
Extensive Comparison of Bug Prediction Approaches.”
Proceedings of the 7th IEEE Working Conference on Mining Software
Repositories (MSR07), 31–41.
D’Ambros, Marco, Michele Lanza, and Romain Robbes. 2011.
“Evaluating Defect Prediction Approaches: A Benchmark and an
Extensive Comparison.” Empirical Software Engineering,
1–47. https://doi.org/10.1007/s10664-011-9173-9.
Dash, M., and H. Liu. 1997. “Feature Selection for
Classification.” Intelligent Data Analisys 1 (3):
131–56.
Dash, M., H. Liu, and H. Motoda. 2000. “Consistency Based Feature
Selection.” Pacific-Asia Conf. On Knowledge Discovery and
Data Mining, 98–109.
Dean, Jeffrey, and Sanjay Ghemawat. 2008. “MapReduce: Simplified
Data Processing on Large Clusters.” Commun. ACM (New
York, NY, USA) 51 (1): 107–13. https://doi.org/10.1145/1327452.1327492.
Dejaeger, K., W. Verbeke, D. Martens, and B. Baesens. 2012. “Data
Mining Techniques for Software Effort Estimation: A Comparative
Study.” Software Engineering, IEEE Transactions on 38
(2): 375–97. https://doi.org/10.1109/TSE.2011.55.
Demšar, Janez. 2006. “Statistical Comparisons of Classifiers over
Multiple Data Sets.” Journal of Machine Learning
Research 7 (December): 1–30. http://dl.acm.org/citation.cfm?id=1248547.1248548.
Deocadez, Roger, Rachel Harrison, and Daniel Rodriguez. 2017.
“Preliminary Study on Applying Semi-Supervised Learning to App
Store Analysis.” Proceedings of the 21st International
Conference on Evaluation and Assessment in Software Engineering
(EASE’17) (New York, NY, USA), EASE’17, 320–23. https://doi.org/10.1145/3084226.3084285.
Desharnais, J. M. 1988. “Analyse Statistique de La Productivite
Des Projects de Development En Informatique a Partir de La Technique Des
Points de Fonction.” MSc Thesis, Univ. du Quebec a Montreal.
Doak, J. 1992. An Evaluation of Feature Selection Methods and Their
Application to Computer Security. CSE-92-18. University of
California, Department of Computer Science.
Dolado, J. J. 2001. “On the Problem of the Software Cost
Function.” Information and Software Technology 43 (1):
61–72. https://doi.org/10.1016/S0950-5849(00)00137-3.
Dolado, J. J. 1997. “A Study of the Relationships Among Albrecht
and Mark II Function Points, Lines of Code 4GL and Effort.”
Journal of Systems and Software 37 (2): 161–73. https://doi.org/10.1016/S0164-1212(96)00111-2.
Dolado, José Javier, Daniel Rodriguez, Mark Harman, William B. Langdon,
and Federica Sarro. 2016. “Evaluation of Estimation Models Using
the Minimum Interval of Equivalence.” Applied Soft
Computing 49: 956–67. https://doi.org/http://dx.doi.org/10.1016/j.asoc.2016.03.026.
EBSE. 2010. Template for a Systematic Literature Review
Protocol. http://www.dur.ac.uk/ebse/resources/templates/SLRTemplate.pdf.
Elish, Karim O., and Mahmoud O. Elish. 2008. “Predicting
Defect-Prone Software Modules Using Support Vector Machines.”
Journal of Systems and Software 81 (5): 649–60. https://doi.org/10.1016/j.jss.2007.07.040.
Fawcett, Tom. 2006. “An Introduction to ROC Analysis.”
Pattern Recognition Letters 27 (8): 861–74. https://doi.org/http://dx.doi.org/10.1016/j.patrec.2005.10.010.
Fayyad, Usama, Gregory Piatetsky-Shapiro, and Padhraic Smyth. 1996.
“The KDD Process for Extracting Useful Knowledge from Volumes of
Data.” Commun. ACM (New York, NY, USA) 39 (11): 27–34.
https://doi.org/10.1145/240455.240464.
Feinerer, Ingo, and Kurt Hornik. 2015. Tm: Text Mining Package.
http://CRAN.R-project.org/package=tm.
Feinerer, Ingo, Kurt Hornik, and David Meyer. 2008. “Text Mining
Infrastructure in r.” Journal of Statistical Software 25
(5): 1–54. http://www.jstatsoft.org/v25/i05/.
Fernández, Alberto, Salvador García, and Francisco Herrera. 2011.
“Addressing the Classification with Imbalanced Data: Open Problems
and New Challenges on Class Distribution.” 6th International
Conference on Hybrid Artificial Intelligence Systems (HAIS), 1–10.
Fernández-Delgado, Manuel, Eva Cernadas, Senén Barro, and Dinani Amorim.
2014. “Do We Need Hundreds of Classifiers to Solve Real World
Classification Problems?” Journal of Machine Learning
Research 15: 3133–81. http://jmlr.org/papers/v15/delgado14a.html.
Finkelstein, Anthony, Mark Harman, S. Afshin Mansouri, Jian Ren, and
Yuanyuan Zhang. 2009. “A Search Based Approach to Fairness
Analysis in Requirement Assignments to Aid Negotiation, Mediation and
Decision Making.” Requirements Engineering Journal (RE’08
Special Issue) 14 (4): 231–45. https://doi.org/http://dx.doi.org/10.1007/s00766-009-0075-y.
Flach, Peter A., José Hernández-Orallo, and Cèsar Ferri Ramirez. 2011.
“A Coherent Interpretation of AUC as a Measure of Aggregated
Classification Performance.” Proceedings of the 28th
International Conference on Machine Learning (ICML’11) (Bellevue,
Washington, USA), 657–64.
Freund, Yoav, Robert Schapire, and Naoki Abe. 1999. “A Short
Introduction to Boosting.” Journal-Japanese Society For
Artificial Intelligence 14 (771-780): 1612.
Garcia, S., J. Derrac, J. Cano, and F. Herrera. 2012. “Prototype
Selection for Nearest Neighbor Classification: Taxonomy and Empirical
Study.” IEEE Transactions on Pattern Analysis and Machine
Intelligence 34 (3): 417–35. https://doi.org/10.1109/TPAMI.2011.142.
García, Salvador, Julián Luengo, and Francisco Herrera. 2015. Data
Preprocessing in Data Mining. Springer. https://doi.org/10.1007/978-3-319-10247-4.
González-Barahona, Jesús, and Gregorio Robles. 2012. “On the
Reproducibility of Empirical Software Engineering Studies Based on Data
Retrieved from Development Repositories.” Empirical Software
Engineering 17: 75–89. http://dx.doi.org/10.1007/s10664-011-9181-9.
Gray, D., D. Bowes, N. Davey, Y. Sun, and B. Christianson. 2011.
“The Misuse of the NASA Metrics Data Program Data Sets for
Automated Software Defect Prediction.” 15th Annual Conference
on Evaluation Assessment in Software Engineering (EASE 2011),
96–103. https://doi.org/10.1049/ic.2011.0012.
Guyon, I., and A. Elisseeff. 2003. “An Introduction to Variable
and Feature Selection.” Journal of Machine Learning
Research 3: 1157–82.
Hall, M. A. 1999. “Correlation-Based Feature Selection for Machine
Learning.” PhD thesis, University of Waikato, Department of
Computer Science.
Hall, T., S. Beecham, D. Bowes, D. Gray, and S. Counsell. 2012. “A
Systematic Literature Review on Fault Prediction Performance in Software
Engineering.” IEEE Transactions on Software Engineering
38 (6): 1276–304. https://doi.org/10.1109/TSE.2011.103.
Hall, Tracy, Sarah Beecham, David Bowes, David Gray, and Steve Counsell.
2011. “A Systematic Literature Review on Fault Prediction
Performance in Software Engineering.” Transactions on
Software Engineering.
Halstead, M. H. 1977. Elements of Software Science. Elsevier
Computer Science Library. Operating and Programming Systems Series; 2.
Elsevier.
Hand, David J. 2009. “Measuring Classifier Performance: A Coherent
Alternative to the Area Under the ROC Curve.” Machine
Learning (Hingham, MA, USA) 77 (1): 103–23. https://doi.org/10.1007/s10994-009-5119-5.
Harman, Mark. 2010. “The Relationship Between Search Based
Software Engineering and Predictive Modeling.” Proceedings of
the 6th International Conference on Predictive Models in Software
Engineering (New York, NY, USA), PROMISE ’10, 1:1–13.
https://doi.org/http://doi.acm.org/10.1145/1868328.1868330.
Harman, Mark, and Bryan F. Jones. 2001. “Search-Based Software
Engineering.” Information and Software Technology 43
(14): 833–39.
Harman, M., and J. Clark. 2004. “Metrics Are Fitness Functions
Too.” Proceedings. 10th International Symposium on Software
Metrics 2004, 58–69. https://doi.org/10.1109/METRIC.2004.1357891.
Hastings, T. E., and A. S. M. Sajeev. 2001. “A Vector-Based
Approach to Software Size Measurement and Effort Estimation.”
IEEE Transactions on Software Engineering 27 (4): 337–50. https://doi.org/10.1109/32.917523.
He, Haibo, and E. A. Garcia. 2009. “Learning from Imbalanced
Data.” IEEE Transactions on Knowledge and Data
Engineering 21 (9): 1263–84. https://doi.org/10.1109/TKDE.2008.239.
Heiat, Abbas, and Nafisseh Heiat. 1997. “A Model for Estimating
Efforts Required for Developing Small-Scale Business
Applications.” Journal of Systems and Software 39 (1):
7–14. https://doi.org/10.1016/S0164-1212(96)00159-8.
Herraiz, Israel, Daniel Izquierdo-Cortazar, Francisco Rivas-Hernandez,
et al. 2009. “FLOSSMetrics: Free / Libre / Open
Source Software Metrics.” Proceedings of the 13th European
Conference on Software Maintenance and Reengineering (CSMR)
(Kaiserlauten, Germany).
Herraiz, Israel, Daniel Rodriguez, and Rachel Harrison. 2012. “On
the Statistical Distribution of Object-Oriented System
Properties.” 3rd International Workshop on Emerging Trends in
Software Metrics (WETSoM 2012).
Herrera, Francisco, Cristóbal J. Carmona del Jesus,
Pedro González, and María José del Jesus. 2011. “An
Overview on Subgroup Discovery: Foundations and Applications.”
Knowledge and Information Systems 29: 495–525.
Hoekstra, Rink, Richard D. Morey, Jeffrey N. Rouder, and Eric-Jan
Wagenmakers. 2014. “Robust Misinterpretation of Confidence
Intervals.” Psychonomic Bulletin &
Review 21 (5): 1157–64. https://doi.org/10.3758/s13423-013-0572-3.
Hosseini, Seyedrebvar, Burak Turhan, and Mika Mantyla. 2017. “A
Benchmark Study on the Effectiveness of Search-Based Data Selection and
Feature Selection for Cross Project Defect Prediction.”
Information and Software Technology, –. https://doi.org/https://doi.org/10.1016/j.infsof.2017.06.004.
Howison, James, Megan Conklin, and Kevin Crowston. 2006.
“FLOSSmole: A Collaborative Repository for
FLOSS Research Data and Analyses.” International
Journal of Information Technology and Web Engineering 1 (3). https://doi.org/10.4018/jitwe.2006070102.
Huang, LiGuo, V. Ng, I. Persing, Ruili Geng, Xu Bai, and Jeff Tian.
2011. “AutoODC: Automated Generation of Orthogonal
Defect Classifications.” 26th IEEE/ACM International
Conference on Automated Software Engineering (ASE’2011), November,
412–15. https://doi.org/10.1109/ASE.2011.6100086.
Ibarguren, Igor, J. M. Pérez, Javier Muguerza, Daniel Rodriguez, and
Rachel Harrison. 2017. “The Consolidated Tree Construction
Algorithm in Imbalanced Defect Prediction Datasets.”
Evolutionary Methods and Machine Learning in SE, Testing and SE
Repositories. Proceedings of the 2017 IEEE Congress on Evolutionary
Computation (CEC2017), 96–103.
Japkowicz, N., and M. Shah. 2011. Evaluating Learning Algorithms: A
Classification Perspective. Evaluating Learning Algorithms: A
Classification Perspective. Cambridge University Press.
Jeffery, Ross, and John Stathis. 1996. “Function Point Sizing:
Structure, Validity and Applicability.” Empirical Software
Engineering 1: 11–30. http://dx.doi.org/10.1007/BF00125809.
Jiang, Yue, Bojan Cukic, and Yan Ma. 2008. “Techniques for
Evaluating Fault Prediction Models.” Empirical Software
Engineering 13 (5): 561–95. https://doi.org/10.1007/s10664-008-9079-3.
Jorgensen, M. 2004. “Realism in Assessment of Effort Estimation
Uncertainty: It Matters How You Ask.” IEEE Transactions on
Software Engineering 30 (4): 209–17. https://doi.org/10.1109/TSE.2004.1274041.
Jørgensen, Magne, Ulf Indahl, and Dag Sjøberg. 2003. “Software
Effort Estimation by Analogy and ’Regression Toward Themean’.”
Journal of Systems and Software 68 (3): 253–62. https://doi.org/10.1016/S0164-1212(03)00066-9.
Jørgensen, M., and M. Shepperd. 2007. “A Systematic Review of
Software Development Cost Estimation Studies.” IEEE
Transactions on Software Engineering 33 (1): 33–53. https://doi.org/10.1109/TSE.2007.256943.
Jureczko, Marian, and Diomidis Spinellis. 2010. “Using
Object-Oriented Design Metrics to Predict Software Defects.”
Models and Methodology of System Dependability. Proceedings of
RELCOMEX 2010 Fifth International Conference on
Dependability of Computer Systems DepCoS
(Wrocław, Poland), Monographs of system dependability,
69–81. http://www.dmst.aueb.gr/dds/pubs/conf/2010-DepCoS-RELCOMEX-ckjm-defects/html/JS10.html.
Kamei, Y., A. Monden, S. Matsumoto, T. Kakimoto, and K. Matsumoto. 2007.
“The Effects of over and Under Sampling on Fault–Prone Module
Detection.” Empirical Software Engineering and Measurement
(ESEM), 196–204.
Kavšek, Branko, and Nada Lavrač. 2006. “APRIORI-SD:
Adapting Association Rule Learning to Subgroup Discovery.”
Applied Artificial Intelligence 20 (7): 543–83. https://doi.org/10.1080/08839510600779688.
Kaymak, Uzay, Arie Ben-David, and Rob Potharst. 2012. “The AUK: A
Simple Alternative to the AUC.” Engineering Applications of
Artificial Intelligence 25 (5): 1082–89. https://doi.org/10.1016/j.engappai.2012.02.012.
Keivanloo, I., C. Forbes, A. Hmood, et al. 2012. “A Linked Data
Platform for Mining Software Repositories.” Mining Software
Repositories (MSR), 2012 9th IEEE Working Conference on, 32–35. https://doi.org/10.1109/MSR.2012.6224296.
Keivanloo, Iman, Christopher Forbes, Juergen Rilling, and Philippe
Charland. 2011. “Towards Sharing Source Code Facts Using Linked
Data.” Proceedings of the 3rd International Workshop on
Search-Driven Development: Users, Infrastructure, Tools, and
Evaluation (New York, NY, USA), SUITE ’11, 25–28. https://doi.org/10.1145/1985429.1985436.
Kemerer, Chris F. 1987. “An Empirical Validation of Software Cost
Estimation Models.” Communications of the ACM (New York,
NY, USA) 30 (5): 416–29. https://doi.org/10.1145/22899.22906.
Khoshgoftaar, T. M., E. Allen, J. Hudepohl, and S. Aud. 1997.
“Application of Neural Networks to Software Quality Modeling of a
Very Large Telecommunications System.” IEEE Transactions on
Neural Networks 8 (4): 902–9.
Khoshgoftaar, T. M., N. Seliya, and K. Gao. 2005. “Assessment of a
New Three-Group Software Quality Classification Technique: An Empirical
Case Study.” Empirical Software Engineering 10 (2):
183–218.
Khoshgoftaar, T. M., and J. Van Hulse. 2009. “Empirical Case
Studies in Attribute Noise Detection.” Systems, Man, and
Cybernetics, Part C: Applications and Reviews, IEEE Transactions on
39 (4): 379–88. https://doi.org/10.1109/TSMCC.2009.2013815.
Khoshgoftaar, Taghi M., Kehan Gao, and Amri Napolitano. 2012a. “An
Empirical Study of Feature Ranking Techniques for Software Quality
Prediction.” International Journal of Software Engineering
and Knowledge Engineering 22 (2): 161–83.
Khoshgoftaar, Taghi M., Kehan Gao, and Amri Napolitano. 2012b.
“Exploring an Iterative Feature Selection Technique for Highly
Imbalanced Data Sets.” IEEE 13th International Conference on
Information Reuse and Integration (IRI’2012), 101–8. https://doi.org/10.1109/IRI.2012.6302997.
Khoshgoftaar, Taghi M., and Naeem Seliya. 2003. “Analogy-Based
Practical Classification Rules for Software Quality Estimation.”
Empirical Software Engineering (Hingham, MA, USA) 8 (4):
325–50. https://doi.org/http://dx.doi.org/10.1023/A:1025316301168.
Khoshgoftaar, Taghi M., and Naeem Seliya. 2004. “Comparative
Assessment of Software Quality Classification Techniques: An Empirical
Case Study.” Empirical Software Engineering (Hingham,
MA, USA) 9 (3): 229–57. https://doi.org/10.1023/B:EMSE.0000027781.18360.9b.
Khoshgoftaar, T., E. Allen, and J. Deng. 2002. “Using Regression
Trees to Classify Fault-Prone Software Modules.” IEEE
Transactions on Reliability.
Kirsopp, C., and M. Shepperd. 2002. “Case and Feature Subset
Selection in Case-Based Software Project Effort Prediction.”
22nd International Conference on Knowledge-Based Systems and Applied
Artificial Intelligence (SGAI’02).
Kitchenham, B. 1998. “A Procedure for Analyzing Unbalanced
Datasets.” IEEE Transactions on Software Engineering 24
(4): 278–301. https://doi.org/10.1109/32.677185.
Kitchenham, Barbara. 2008. “The Role of Replications in Empirical
Software Engineering: A Word of Warning.” Empirical Software
Engineering 13 (2): 219–21.
Kitchenham, Barbara A. 2002. “The Question of Scale Economies in
Software — Why Cannot Researchers Agree?” Information and
Software Technology 44 (1): 13–24. https://doi.org/10.1016/S0950-5849(01)00204-X.
Kitchenham, Barbara A., and S. Charters. 2007. Guidelines for
Performing Systematic Literature Reviews in Software Engineering.
Technical Report EBSE-2007-01. Keele University.
Kitchenham, Barbara A., Tore Dyba, and Magne Jørgensen. 2004.
“Evidence-Based Software Engineering.” Proceedings of
the 26th International Conference on Software Engineering (ICSE’04)
(Washington, DC, USA), 273–81.
Kitchenham, Barbara Ann. 1997. “Evaluating Software Engineering
Methods and Tools, Part 7: Planning Feature Analysis Evaluation.”
SIGSOFT Software Engineering Notes (New York, NY, USA) 22 (4):
21–24. https://doi.org/10.1145/263244.263251.
Kitchenham, Barbara A., and N. R. Taylor. 1985. “Software Project
Development Cost Estimation.” Journal of Systems and
Software 5 (4): 267–78. https://doi.org/10.1016/0164-1212(85)90026-3.
Kitchenham, Barbara, Shari Lawrence Pfleeger, Beth McColl, and Suzanne
Eagan. 2002a. “An Empirical Study of Maintenance and Development
Estimation Accuracy.” Journal of Systems and Software 64
(1): 57–77. https://doi.org/10.1016/S0164-1212(02)00021-3.
Kitchenham, Barbara, Shari Lawrence Pfleeger, Beth McColl, and Suzanne
Eagan. 2002b. “An Empirical Study of Maintenance and Development
Estimation Accuracy.” Journal of Systems and Software 64
(1): 57–77. https://doi.org/10.1016/S0164-1212(02)00021-3.
Klösgen, Willi. 1996. Explora: A Multipattern and Multistrategy
Discovery Assistant. American Association for Artificial
Intelligence.
Knutson, Charles D., Jonathan L. Krein, Lutz Prechelt, and Natalia
Juristo. 2010. “Report from the 1st International Workshop on
Replication in Empirical Software Engineering Research (RESER
2010).” SIGSOFT Software Engineering Notes (New York,
NY, USA) 35: 42–44. https://doi.org/http://doi.acm.org/10.1145/1838687.1838698.
Kohavi, R., and G. H. John. 1995. “Automatic Parameter Selection
by Minimizing Estimated Error.” 12th Int. Conf. On Machine
Learning (San Francisco), 304–12.
Kohavi, R., and G. H. John. 1997. “Wrappers for Feature Subset
Selection.” Artificial Intelligence 1-2: 273–324.
Krishnan, Sandeep, Chris Strasburg, Robyn R. Lutz, and Katerina
Goševa-Popstojanova. 2011. “Are Change Metrics Good Predictors for
an Evolving Software Product Line?” Proceedings of the 7th
International Conference on Predictive Models in Software Engineering
(Promise’11) (New York, NY, USA), Promise’11, 7:1–10. https://doi.org/10.1145/2020390.2020397.
Kuhn, Max, and Kjell Johnson. 2013. Applied Predictive
Modeling. Springer-Verlag. https://doi.org/10.1007/978-1-4614-6849-3.
Langdon, William B., Javier Dolado, Federica Sarro, and Mark Harman.
2016. “Exact Mean Absolute Error of Baseline Predictor,
MARP0.” Information and Software Technology (USA) 73
(C): 16–18. https://doi.org/10.1016/j.infsof.2016.01.003.
Langley, P. 1994. “Selection of Relevant Features in Machine
Learning.” Procs. Of the AAAI Fall Symposium on
Relevance, 140–44.
Lavrač, Nada, Branko Kavšek, Peter Flach, and Ljupčo Todorovski. 2004.
“Subgroup Discovery with CN2-SD.” The
Journal of Machine Learning Research (Cambridge, MA, USA) 5:
153–88.
Lessmann, S., B. Baesens, C. Mues, and S. Pietsch. 2008.
“Benchmarking Classification Models for Software Defect
Prediction: A Proposed Framework and Novel Findings.” IEEE
Transactions on Software Engineering 34 (4): 485–96. https://doi.org/10.1109/TSE.2008.35.
Li, Jingzhou, and Guenther Ruhe. 2006. “A Comparative Study of
Attribute Weighting Heuristics for Effort Estimation by Analogy.”
Proceedings of the 2006 ACM/IEEE International Symposium on
Empirical Software Engineering (New York, NY, USA), ISESE ’06,
66–74. https://doi.org/10.1145/1159733.1159746.
Li, Jingzhou, and Guenther Ruhe. 2008. “Multi-Criteria Decision
Analysis for Customization of Estimation by Analogy Method
AQUA+.” Proceedings of the 4th International Workshop on
Predictor Models in Software Engineering (PROMISE) (New York, NY,
USA), PROMISE ’08, 55–62.
Li, Jingzhou, Guenther Ruhe, Ahmed Al-Emran, and Michael M. Richter.
2007. “A Flexible Method for Software Effort Estimation by
Analogy.” Empirical Software Engineering 12 (1): 65–106.
https://doi.org/10.1007/s10664-006-7552-4.
Li, Ming, Hongyu Zhang, Rongxin Wu, and Zhi-Hua Zhou. 2012.
“Sample-Based Software Defect Prediction with Active and
Semi-Supervised Learning.” Automated Software
Engineering 19 (2): 201–30. https://doi.org/10.1007/s10515-011-0092-1.
Liebchen, G. 2011. “Data Cleaning Techniques for Software
Engineering Data Sets.” PhD Dissertation, Dept. of Information
Systems; Computing, Brunel University.
Lincke, Rüdiger, Jonas Lundberg, and Welf Löwe. 2008. “Comparing
Software Metrics Tools.” Proceedings of the 2008
International Symposium on Software Testing and Analysis (ISSTA’08)
(New York, NY, USA), ISSTA’08, 131–42. https://doi.org/10.1145/1390630.1390648.
Linstead, Erik, Sushil Bajracharya, Trung Ngo, Paul Rigor, Cristina
Lopes, and Pierre Baldi. 2009. “Sourcerer: Mining and Searching
Internet-Scale Software Repositories.” Data Mining and
Knowledge Discovery 18: 300–336. https://doi.org/10.1007/s10618-008-0118-x.
Liu, H., and H. Motoda. 1998. Feature Selection for Knowlegde
Discovery and Data Mining. Kluwer Academic Publishers.
Liu, H., and L. Yu. 2005. “Toward Integrating Feature Selection
Algorithms for Classification and Clustering.” IEEE Trans. On
Knowledge and Data Eng. 17 (3): 1–12.
Lokan, C., T. Wright, P. Hill, and M. Stringer. 2001.
“Organizational Benchmarking Using the ISBSG Data
Repository.” IEEE Software 18 (5): 26–32. https://doi.org/10.1109/52.951491.
Lopes, C., S. Bajracharya, J. Ossher, and P. Baldi. 2010.
UCI Source Code Data Sets. University of
California, Irvine, Bren School of Information; Computer Sciences. http://www.ics.uci.edu/$\sim$lopes/datasets/.
Lu, H., B. Cukic, and M. Culp. 2012. “Software Defect Prediction
Using Semi-Supervised Learning with Dimension Reduction.”
27th IEEE/ACM International Conference on Automated Software
Engineering (ASE’12), 314–17. https://doi.org/10.1145/2351676.2351734.
Madeyski, Lech, and Marian Jureczko. 2015. “Which Process Metrics
Can Significantly Improve Defect Prediction Models? An Empirical
Study.” Software Quality Journal 23 (3): 393–422. https://doi.org/10.1007/s11219-014-9241-7.
Mair, Carolyn, Martin Shepperd, and Magne Jørgensen. 2005. “An
Analysis of Data Sets Used to Train and Validate Cost Prediction
Systems.” SIGSOFT Software Engineering Notes (New York,
NY, USA) 30 (4): 1–6. https://doi.org/10.1145/1082983.1083166.
Martı́nez-Ballesteros, M., A. Troncoso, F. Martı́nez-Álvarez, and J. C.
Riquelme. 2016. “Improving a Multi-Objective Evolutionary
Algorithm to Discover Quantitative Association Rules.” Knowl.
Inf. Syst. (New York, NY, USA) 49 (2): 481–509. https://doi.org/10.1007/s10115-015-0911-y.
Matson, J. E., B. E. Barrett, and J. M. Mellichamp. 1994.
“Software Development Cost Estimation Using Function
Points.” Software Engineering, IEEE Transactions on 20
(4): 275–87. https://doi.org/10.1109/32.277575.
Maxwell, Katrina. 2002. Applied Statistics for Software
Managers. Prentice Hall.
McCabe, T. J. 1976. “A Complexity Measure.” IEEE
Transactions on Software Engineering 2 (4): 308–20.
Mende, Thilo. 2010. “Replication of Defect Prediction Studies:
Problems, Pitfalls and Recommendations.” Proceedings of the
6th International Conference on Predictive Models in Software
Engineering (New York, NY, USA), PROMISE’10, 5:1–10. https://doi.org/10.1145/1868328.1868336.
Mende, Thilo, and Rainer Koschke. 2009. “Revisiting the Evaluation
of Defect Prediction Models.” Proceedings of the 5th
International Conference on Predictor Models in Software Engineering
(PROMISE’09) (New York, NY, USA), 1–10. https://doi.org/http://doi.acm.org/10.1145/1540438.1540448.
Mende, Thilo, and Rainer Koschke. 2010. “Effort-Aware Defect
Prediction Models.” Proceedings of the 2010 14th European
Conference on Software Maintenance and Reengineering (CSMR’10)
(Washington, DC, USA), CSMR’10, 107–16. https://doi.org/10.1109/CSMR.2010.18.
Mendes, Emilia, Sergio Di Martino, Filomena Ferrucci, and Carmine
Gravino. 2008. “Cross-Company Vs. Single-Company Web Effort Models
Using the Tukutuku Database: An Extended Study.” Journal of
Systems and Software 81 (5): 673–90. https://doi.org/10.1016/j.jss.2007.07.044.
Menzies, T., A. Butcher, A. Marcus, T. Zimmermann, and D. Cok. 2011.
“Local Vs. Global Models for Effort Estimation and Defect
Prediction.” Automated Software Engineering (ASE), 2011 26th
IEEE/ACM International Conference on, 343–51. https://doi.org/10.1109/ASE.2011.6100072.
Menzies, T., J. Greenwald, and A. Frank. 2007. “Data Mining Static
Code Attributes to Learn Defect Predictors.” IEEE
Transactions on Software Engineering.
Menzies, Tim, Bora Caglayan, Ekrem Kocaguneli, Joe Krall, Fayola Peters,
and Burak Turhan. 2012. The PROMISE Repository of Empirical Software
Engineering Data. http://promisedata.googlecode.com.
Menzies, Tim, Alex Dekhtyar, Justin Distefano, and Jeremy Greenwald.
2007. “Problems with Precision: A Response to Comments on Data
Mining Static Code Attributes to Learn Defect Predictors.”
IEEE Transactions on Software Engineering (Los Alamitos, CA,
USA) 33 (9): 637–40. https://doi.org/http://doi.ieeecomputersociety.org/10.1109/TSE.2007.70721.
Mikut, Ralf, and Markus Reischl. 2011. “Data Mining Tools.”
Wiley Interdisciplinary Reviews: Data Mining and Knowledge
Discovery 1 (5): 431–43. https://doi.org/10.1002/widm.24.
Mišić, Vojislav B, and Dejan N Tevsić. 1998. “Estimation of Effort
and Complexity: An Object-Oriented Case Study.” Journal of
Systems and Software 41 (2): 133–43. https://doi.org/10.1016/S0164-1212(97)10014-0.
Mitchell, T. 1997. Machine Learning. McGraw Hill.
Miyazaki, Y., M. Terakado, K. Ozaki, and H. Nozaki. 1994. “Robust
Regression for Developing Software Estimation Models.”
Journal of Systems and Software 27 (1): 3–16. https://doi.org/10.1016/0164-1212(94)90110-4.
Mockus, Audris, Roy T. Fielding, and James D. Herbsleb. 2002. “Two
Case Studies of Open Source Software
Development: Apache and Mozilla.”
ACM Transactions on Software Engineering and Methodology 11
(3): 309–46.
Morasca, Sandro, and Luigi Lavazza. 2016. “Slope-Based
Fault-Proneness Thresholds for Software Engineering Measures.”
Proceedings of the 20th International Conference on Evaluation and
Assessment in Software Engineering (New York, NY, USA), EASE ’16,
12:1–10. https://doi.org/10.1145/2915970.2915997.
Moser, R., W. Pedrycz, and G. Succi. 2008. “A Comparative Analysis
of the Efficiency of Change Metrics and Static Code Attributes for
Defect Prediction.” 2008 ACM/IEEE 30th International
Conference on Software Engineering, May, 181–90. https://doi.org/10.1145/1368088.1368114.
Moser, Simon, Brian Henderson-Sellers, and Vojislav B Mišić. 1999.
“Cost Estimation Based on Business Models.” Journal of
Systems and Software 49 (1): 33–42. https://doi.org/10.1016/S0164-1212(99)00064-3.
Myrtveit, I., E. Stensrud, and M. Shepperd. 2005. “Reliability and
Validity in Comparative Studies of Software Prediction Models.”
IEEE Transactions on Software Engineering 31 (5): 380–91. https://doi.org/10.1109/TSE.2005.58.
Nagappan, Nachiappan, Andreas Zeller, Thomas Zimmermann, Kim Herzig, and
Brendan Murphy. 2010. “Change Bursts as Defect Predictors.”
Proceedings of the 21st IEEE International Symposium on Software
Reliability Engineering (ISSRE 2012) (San Jose, California, USA).
Nagappan, Nachiappan, Andreas Zeller, Thomas Zimmermann, Kim Herzig, and
Brendan Murphy. 2012. “Change Bursts as Defect Predictors.”
21st IEEE International Symposium on Software Reliability
Engineering (ISSRE 2012) (San Jose, California, USA).
Nussbaum, L., and S. Zacchiroli. 2010. “The Ultimate Debian
Database: Consolidating Bazaar Metadata for Quality Assurance and Data
Mining.” 7th IEEE Working Conference on Mining Software
Repositories (MSR 2010), May, 52–61. https://doi.org/10.1109/MSR.2010.5463277.
Ohloh. n.d. Ohloh. http://www.ohloh.net/.
Panichella, A., R. Oliveto, and A. De Lucia. 2014. “Cross-Project
Defect Prediction Models: L’union Fait La Force.” 2014
Software Evolution Week - IEEE Conference on Software Maintenance,
Reengineering, and Reverse Engineering (CSMR-WCRE), February,
164–73. https://doi.org/10.1109/CSMR-WCRE.2014.6747166.
Perini, A., A. Susi, and P. Avesani. 2012. “A Machine Learning
Approach to Software Requirements Prioritization.” IEEE
Transactions on Software Engineering Preprint (–): –. https://doi.org/10.1109/TSE.2012.52.
Quinlan, J. R. 1993. C4.5: Programs for Machine Learning.
Morgan Kaufmann.
R Core Team. 2016. R: A Language and Environment for Statistical
Computing. R Foundation for Statistical Computing. https://www.R-project.org/.
Räihä, Outi. 2010. “A Survey on Search-Based Software
Design.” Computer Science Review 4 (4): 203–49.
https://doi.org/http://dx.doi.org/10.1016/j.cosrev.2010.06.001.
Rakotomalala, Ricco. 2005. “TANAGRA : Un Logiciel Gratuit Pour
l’enseignement Et La Recherche.” Actes de EGC’2005,
RNTI-e-3 2: 697–702.
Ratzinger, J., T. Sigmund, P. Vorburger, and H. Gall. 2007.
“Mining Software Evolution to Predict Refactoring.”
Empirical Software Engineering and Measurement, 2007. ESEM 2007.
First International Symposium on, 354–63. https://doi.org/10.1109/ESEM.2007.9.
Raudys, S. J., and A. K. Jain. 1991. “Small Sample Size Effects in
Statistical Pattern Recognition: Recommendations for
Practitioners.” IEEE Transactions on Pattern Analysis and
Machine Intelligence 13 (3): 252–64. https://doi.org/10.1109/34.75512.
Robles, Gregorio, Jesus M. Gonzalez-Barahona, Daniel Izquierdo-Cortazar,
and Israel Herraiz. 2009. “Tools for the Study of the Usual Data
Sources Found in Libre Software Projects.” International
Journal of Open Source Software and Processes 1 (1): 24–45.
Rodriguez, D., R. Ruiz, J. Cuadrado, and J. Aguilar-Ruiz. 2007.
“Detecting Fault Modules Applying Feature Selection to
Classifiers.” IEEE International Conference on Information
Reuse and Integration (IRI 2007), 667–72. https://doi.org/10.1109/IRI.2007.4296696.
Rodriguez, D., R. Ruiz, J. C. Riquelme, and J. S. Aguilar-Ruiz. 2012.
“Searching for Rules to Detect Defective Modules: A Subgroup
Discovery Approach.” Information Sciences 191: 14–30. https://doi.org/10.1016/j.ins.2011.01.039.
Rodríguez, D., M. A. Sicilia, E. García, and R. Harrison. 2012.
“Empirical Findings on Team Size and Productivity in Software
Development.” Journal of Systems and Software 85 (3):
562–70. https://doi.org/10.1016/j.jss.2011.09.009.
Schofield, C. 1998. “An Empirical Investigation into Software
Effort Estimation by Analogy.” PhD thesis, Bournemouth
University.
Seiffert, C., T. M. Khoshgoftaar, J. Van Hulse, and A. Folleco. 2007.
“An Empirical Study of the Classification Performance of Learners
on Imbalanced and Noisy Software Quality Data.” 2007 IEEE
International Conference on Information Reuse and Integration, IEEE
IRI-2007, 651–58.
Shang, Weiyi, Bram Adams, and Ahmed E. Hassan. 2010. “An
Experience Report on Scaling Tools for Mining Software Repositories
Using MapReduce.” Proceedings of the IEEE/ACM International
Conference on Automated Software Engineering (New York, NY, USA),
ASE ’10, 275–84. https://doi.org/10.1145/1858996.1859050.
Shang, Weiyi, Bram Adams, and Ahmed E. Hassan. 2012. “Using
PIG as a Data Preparation Language for Large-Scale Mining
Software Repositories Studies: An Experience Report.” Journal
of Systems and Software 85 (10): 2195–204. https://doi.org/10.1016/j.jss.2011.07.034.
Shatnawi, Raed, Wei Li, James Swain, and Tim Newman. 2010.
“Finding Software Metrics Threshold Values Using ROC
Curves.” Journal of Software Maintenance and Evolution:
Research and Practice 22 (1): 1–16. https://doi.org/10.1002/smr.404.
Shearer, Colin. 2000. “The CRISP-DM Model: The New Blueprint for
Data Mining.” Journal of Data Warehousing 5 (4).
Shepperd, Martin, and Steve MacDonell. 2012. “Evaluating
Prediction Systems in Software Project Estimation.”
Information and Software Technology, –. https://doi.org/10.1016/j.infsof.2011.12.008.
Shepperd, M., and M. Cartwright. 2001. “Predicting with Sparse
Data.” Software Engineering, IEEE Transactions on 27
(11): 987–98. https://doi.org/10.1109/32.965339.
Shepperd, M., and G. Kadoda. 2001. “Comparing Software Prediction
Techniques Using Simulation.” IEEE Transactions on Software
Engineering 27 (11): 1014–22. https://doi.org/10.1109/32.965341.
Shepperd, M., and C. Schofield. 1997. “Estimating Software Project
Effort Using Analogies.” IEEE Transactions on Software
Engineering 23 (11): 736–43. https://doi.org/10.1109/32.637387.
Shepperd, M., Qinbao Song, Zhongbin Sun, and C. Mair. 2013. “Data
Quality: Some Comments on the NASA Software Defect Datasets.”
IEEE Transactions on Software Engineering 39 (9): 1208–15. https://doi.org/10.1109/TSE.2013.11.
Shippey, Thomas, Tracy Hall, Steve Counsell, and David Bowes. 2016.
“So You Need More Method Level Datasets for Your Software Defect
Prediction? Voila!” 10th ACM/IEEE International Symposium on
Empirical Software Engineering and Measurement (ESEM’16) (New York,
NY, USA), ESEM’16, 12:1–6. https://doi.org/10.1145/2961111.2962620.
Singh, Yogesh, Arvinder Kaur, and Ruchika Malhotra. 2010.
“Empirical Validation of Object-Oriented Metrics for Predicting
Fault Proneness Models.” Software Quality Journal 18:
3–35. http://dx.doi.org/10.1007/s11219-009-9079-6.
Song, Q., M. Shepperd, M. Cartwright, and C. Mair. 2006. “Software
Defect Association Mining and Defect Correction Effort
Prediction.” IEEE Transactions on Software Engineering
32 (2): 69–82. https://doi.org/10.1109/TSE.2006.1599417.
Tempero, Ewan, Craig Anslow, Jens Dietrich, et al. 2010. “Qualitas
Corpus: A Curated Collection of Java Code for Empirical Studies.”
2010 Asia Pacific Software Engineering Conference (APSEC2010),
December.
Thai-Nghe, Nguyen, Z. Gantner, and L. Schmidt-Thieme. 2011. “A New
Evaluation Measure for Learning from Imbalanced Data.” The
2011 International Joint Conference on Neural Networks (IJCNN’11),
537–42. https://doi.org/10.1109/IJCNN.2011.6033267.
Torgo, Luis. 2010. Data Mining with r: Learning with Case
Studies. 1st Edt. Chapman & Hall/CRC.
Ugarte, M. D., A. F. Militino, and A. T. Arnholt. 2015. Probability
and Statistics with r, Second Edition. Taylor & Francis. https://books.google.es/books?id=4aB0pwAACAAJ.
Van Antwerp, M., and G. Madey. 2008. Advances in the SourceForge
Research Data Archive (SRDA). Milan, Italy.
Van Hulse, J., T. M. Khoshgoftaar, and A. Napolitano. 2007.
“Experimental Perspectives on Learning from Imbalanced
Data.” Proceedings of the 24th International Conference on
Machine Learning (ICML07) (New York, USA).
Vandecruys, Olivier, David Martens, Bart Baesens, Christophe Mues, Manu
De Backer, and Raf Haesen. 2008. “Mining Software Repositories for
Comprehensible Software Fault Prediction Models.” Journal of
Systems and Software 81 (5): 823–39. https://doi.org/DOI:
10.1016/j.jss.2007.07.034.
Vasa, Rajesh. 2010. “Growth and Change Dynamics in Open Source
Software Systems.” PhD thesis, Faculty of Information;
Communication Technologies Swinburne University of Technology Melbourne,
Australia. http://hdl.handle.net/1959.3/95058.
Vasa, Rajesh, Markus Lumpe, and and Allan Jones. 2010. Helix -
Software Evolution Data Set. Http://http://www.ict.swin.edu.au/research/projects/helix.
http://www.ict.swin.edu.au/research/projects/helix.
Wang, Huanjing, Taghi M. Khoshgoftaar, Randall Wald, and Amri
Napolitano. 2012. “A Comparative Study on the Stability of
Software Metric Selection Techniques.” 11th International
Conference on Machine Learning and Applications, ICMLA, 301–7.
Weimer, Westley, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest.
2009. “Automatically Finding Patches Using Genetic
Programming.” Proceedings of the 31st International
Conference on Software Engineering(ICSE’09) (Washington, DC, USA),
ICSE’09, 364–74. https://doi.org/10.1109/ICSE.2009.5070536.
Wen, Jianfeng, Shixian Li, Zhiyong Lin, Yong Hu, and Changqin Huang.
2012. “Systematic Literature Review of Machine Learning Based
Software Development Effort Estimation Models.” Information
and Software Technology 54 (1): 41–59. https://doi.org/10.1016/j.infsof.2011.09.002.
Williams, G. 2009. “Rattle: A Data Mining GUI for r.”
The R Journal 2 (2): 45–55.
Williams, Graham J. 2011. Data Mining with Rattle and
R: The Art of Excavating Data for Knowledge Discovery.
Use r! Springer. http://www.springer.com/gp/book/9781441998897.
Witten, I. H., and E. Frank. 2005. Data Mining: Practical Machine
Learning Tools and Techniques. 2nd ed. Morgan Kaufmann.
Witten, I. H., E. Frank, and M. A. Hall. 2011. Data Mining:
Practical Machine Learning Tools and Techniques. 3rd Edition.
Morgan Kaufmann.
Wohlin, Claes, Per Runeson, Martin Höst, Magnus C. Ohlsson, Bjöorn
Regnell, and Anders Wesslén. 2000. Experimentation in Software
Engineering: An Introduction. Kluwer Academic Publishers.
Woodfield, S. N., V. Y. Shen, and H. E. Dunsmore. 1981. “A Study
of Several Metrics for Programming Effort.” Journal of
Systems and Software 2 (2): 97–103. https://doi.org/10.1016/0164-1212(81)90029-7.
Wrobel, Stefan. 1997. “An Algorithm for Multi-Relational Discovery
of Subgroups.” Proceedings of the 1st European Symposium on
Principles of Data Mining, 78–87.
Wrobel, Stefan. 2001. Relational Data Mining. Edited by Saso
Dzeroski and Nada Lavrac̆. Springer.
Xie, Yihui. 2015. Dynamic Documents with R and
Knitr. 2nd ed. Chapman; Hall/CRC. http://yihui.name/knitr/.
Xie, Yihui. 2016a. Bookdown: Authoring Books with r Markdown.
https://github.com/rstudio/bookdown.
Xie, Yihui. 2016b. Knitr: A General-Purpose Package for Dynamic
Report Generation in r. https://CRAN.R-project.org/package=knitr.
Yu, L., and H. Liu. 2004a. “Efficient Feature Selection via
Analysis of Relevance and Redundancy.” Journal of Machine
Learning Research 5: 1205–24.
Yu, L., and H. Liu. 2004b. “Redundancy Based Feature Selection for
Microarry Data.” 10th ACM SIGKDD Int. Conf. On Knowledge
Discovery and Data Mining.
Zhang, H. 2009. “An Investigation of the Relationships Between
Lines of Code and Defects.” IEEE International Conference on
Software Maintenance, 274–83. https://doi.org/10.1109/ICSM.2009.5306304.
Zhang, Hongyu. 2008. “On the Distribution of Software
Faults.” IEEE Transactions on Software Engineering 34
(2): 301–2. https://doi.org/10.1109/TSE.2007.70771.
Zhang, Hongyu, and Xiuzhen Zhang. 2007. “Comments on "Data Mining
Static Code Attributes to Learn Defect Predictors".” IEEE
Transactions on Software Engineering (Los Alamitos, CA, USA) 33
(9): 635–37. https://doi.org/http://doi.ieeecomputersociety.org/10.1109/TSE.2007.70706.
Zhang, Yuanyuan, Mark Harman, Anthony Finkelstein, and S. Afshin
Mansouri. 2011. “Comparing the Performance of Metaheuristics for
the Analysis of Multi-Stakeholder Tradeoffs in Requirements
Optimisation.” Information and Software Technology 53
(7): 761–73. https://doi.org/http://dx.doi.org/10.1016/j.infsof.2011.02.001.
Zimmermann, Thomas, Nachiappan Nagappan, Harald Gall, Emanuel Giger, and
Brendan Murphy. 2009. “Cross-Project Defect Prediction: A Large
Scale Experiment on Data Vs. Domain Vs. Process.” Proceedings
of the the 7th Joint Meeting of the European Software Engineering
Conference and the ACM SIGSOFT Symposium on the Foundations of Software
Engineering (New York, NY, USA), ESEC/FSE’09, 91–100. https://doi.org/10.1145/1595696.1595713.
Zimmermann, Thomas, Rahul Premraj, and Andreas Zeller. 2007.
“Predicting Defects for Eclipse.” Proceedings of the
Third International Workshop on Predictor Models in Software Engineering
(PROMISE’07) (Washington, DC, USA), PROMISE’07, 9–9.
https://doi.org/http://dx.doi.org/10.1109/PROMISE.2007.10.
Zimmermann, T., R. Premraj, and A. Zeller. 2007. “Predicting
Defects for Eclipse.” International Workshop on Predictor
Models in Software Engineering (PROMISE’07), May, 9. https://doi.org/10.1109/PROMISE.2007.10.