library(foreign)
# Load preprocessed ISBSG data
d <- read.arff("datasets/ISBSGv10_AttributesSelected_952Instances_8Att_Train_CLast.arff")
str(d)
# Check normality of MaxTeamSize with and without log transform
x <- d$MaxTeamSize
par(mfrow = c(1, 2))
qqnorm(x, main = "MaxTeamSize (original)")
qqline(x)
qqnorm(log(x), main = "MaxTeamSize (log scale)")
qqline(log(x))
shapiro.test(log(x))
# Compare productivity of large teams (>=9) vs small teams
d2 <- read.arff("datasets/ATSNonEmpty_200Inst.arff")
ats <- d2$AverageTeamSize
prod <- d2$NormalisedLevel1PDR
boxplot(prod[ats >= 9], prod[ats < 9],
names = c("Large team (>=9)", "Small team (<9)"),
ylab = "Productivity (PDR, hrs/FP)",
main = "Team size vs productivity (ISBSG)")
wilcox.test(prod[ats >= 9], prod[ats < 9])Appendix A — Appendix: Preprocessing the ISBSG Repository
A.1 The ISBSG Repository
The International Software Benchmarking Standards Group (ISBSG), a non-profit organisation, maintains a software project management repository contributed by a variety of organisations. The ISBSG checks the validity of submissions and provides benchmarking information in return. Projects are described using over 60 attributes covering functional size, team size, development platform, lifecycle phases, and more. These attributes fall broadly into the following categories:
- Project context: type of organisation, business area, type of development
- Product characteristics: application type, user base
- Development characteristics: platform, languages, tools
- Project size data: function points (IFPUG, COSMIC, etc.)
- Qualitative factors: experience, use of methodologies
Before applying statistical or machine learning methods to this dataset, substantial preprocessing is required. The steps below describe the data cleaning process applied across multiple ISBSG releases.
A.2 Preprocessing Steps
A.2.1 Instance Selection
Data Quality Rating. Projects are classified from A (fully sound data) to D (fundamental shortcomings). Following ISBSG guidance, only projects rated A or B are retained; C and D rated projects are removed.
Unadjusted Function Point Rating. This rating reflects the quality of the functional size measurement. Again, only A and B rated projects are kept.
Count Approach. Projects sized using methods other than IFPUG, NESMA, Albrecht, or Dreger were removed, as these represent small, non-comparable portions of the database. NESMA is a compliant variant of IFPUG; Albrecht’s technique is the precursor to IFPUG; the Dreger method is a guide to IFPUG counting. All these variants are included.
Recording Method. Only projects where total effort was recorded as Staff hours (Recorded) are retained. This method tracks effort from daily records of all hours expended by each person. Methods such as Staff hours (Derived) or Productive Time Only are excluded as more subjective.
Resource Level. Only Level 1 (development team effort, including project management and administration, but excluding user training, support teams, and end users) is used.
Homogenisation of nominal attributes. Many entries are blank or labelled “don’t know”. Equivalent labels are unified (e.g., COBOL 2 and COBOL II become COBOL). Nominal values representing less than 1% of all projects are removed as non-representative. For example, the Language Type attribute can be 2GL, 3GL, 4GL, 5GL, and ApG, but 2GL, 5GL, and ApG each account for less than 1%; only 3GL and 4GL are retained.
A.2.2 Attribute Selection
After instance cleaning, some attributes contain a single value and provide no information. Attributes with more than 85% missing values are also removed. Where multiple attributes measure the same characteristic, the ISBSG-recommended one is kept: Normalised Level 1 Productivity Delivery Rate for productivity and Normalised Level 1 Work Effort for effort.
A final manual check removes remaining inconsistencies such as a language being classified under two different language types (e.g., Visual Basic appearing as both 3GL and 4GL).
A.3 Selected Attributes
The table below summarises the attributes retained after preprocessing (approximately 951 instances from ISBSG release 10).
| Attribute | Type (Range) | % Missing |
|---|---|---|
| FP — Unadjusted Function Points | Ratio [3–4,911] | 0% |
| Eff — Normalised Level 1 Work Effort (hours) | Ratio [4–73,920] | 0% |
| PDR — Normalised Level 1 Productivity Delivery Rate (hrs/FP) | Ratio [0.1–378.1] | 0% |
| DevType — Development type | Nominal {Enhancement, NewDev} | 0% |
| OrgType — Organisation type | Nominal {Insurance, Ordering, Banking, Manufacturing, …} | 40% |
| Platf — Development platform | Nominal {MF, PC, Multi, MR} | 51% |
| LangType — Language type | Nominal {3GL, 4GL} | 25% |
| PrimaryProgLang — Primary programming language | Nominal {Java, C, COBOL, PL, VB, SQL, C++, NATURAL, …} | 25% |
| MTS — Maximum Team Size | Ratio [0.5–309] | 58% |
| ATS — Average Team Size | Ratio [1–41] | 79% |
A.4 Descriptive Statistics
After preprocessing, the continuous variables show heavily right-skewed distributions. None follow a normal distribution, as confirmed by the Shapiro-Wilk test on all variables; parametric tests should not be applied without a prior transformation.
| FP | Eff | PDR | MTS | ATS | |
|---|---|---|---|---|---|
| Count | 951 | 951 | 951 | 401 | 200 |
| Mean | 302.7 | 3,375.3 | 17.9 | 8.4 | 7.9 |
| Median | 135 | 1,521 | 11.3 | 5 | 5 |
| Std Dev | 490.1 | 6,055.3 | 26.9 | 17.0 | 7.2 |
| Min | 3 | 4 | 0.1 | 0.5 | 1 |
| Max | 4,911 | 73,920 | 387.1 | 309 | 41 |
A.5 R Code Example
The snippet below loads the preprocessed ISBSG ARFF file and explores the distribution of team size and productivity.