9 Effort Estimation Data in Software Engineering
It is worth highlighting the case of software effort estimation datasets with their peculiarities. First, most effort estimation datasets used in the literature are scattered through research papers with the exception of a few kept in the PROMISE repository. Mair et al (2005) also have analysed available datasets in the field of cost estimation identifying 65 different datasets in 50 papers.
Second, their size is very small with the exception of ISBSG repository discussed previously which a small sample is available through PROMISE and the China dataset with 499 instances.
Third, some can be quite old in a context and time that is not applicable to current development environments. The authors noted that the oldest datasets (COCOMO, Desharnais, Kemerer and Albrecht and Gaffney) tend to be the most studied ones and a subset of the most relevant ones. Also, from the artificial intelligence or data mining point of view effort estimation has been mainly tackled with different types of regression techniques and more recently with techniques which are also typically considered under the umbrella of data mining techniques. However, as the number of examples per dataset is increasing, other machine learning techniques are also being studied (e.g.: Dejaeger et al (2012) report on a comparison of several machine learning techniques to effort estimation with only 5 out the 9 used datasets publicly available). From the data mining point of view, the small number of instances hinders the application of machine learning techniques.
However, software effort and cost estimation still remain one of the main challenges in software engineering and have attracted a great deal of interest by many researchers (2007). For example, there are continuous analyses of whether software development follows economies or diseconomies of scale (see Kitchenham (2002)).
Next Table @ref(tab:effEstimation) (following Mair et al (2005) ) shows the most open cost/effort datasets available in the literature with their main reference.
| Reference | Instances | Attributes |
|---|---|---|
| Abran and Robillard (1996) | 21 | 31 |
| Albrecht-Gaffney (1983) | 24 | 7 |
| Bailey and Basili (1981) | 18 | 9 |
| Belady and Lehman (1979) | 33 | |
| Boehm (aka COCOMO Dataset) (1981) | 63 | 43 |
| China dataset1 | 499 | 18 |
| Desharnais (1988) | 61 | 10 |
| Dolado (1997) | 24 | 7 |
| Hastings and Sajeev (2001) | 8 | 14 |
| Heiat and Heiat (Heiat and Heiat 1997) | 35 | 4 |
| Jeffery and Stathis (1996) | 17 | 7 |
| Jorgensen (2004) | 47 | 4 |
| Jorgensen et al. (2003) | 20 | 4 |
| Kemerer (1987) | 15 | 5 |
| Kitchenham (Mermaid 2) (2002) | 30 | 5 |
| Kitchenham et al. (CSC) (2002) | 145 | 9 |
| Kitchenham and Taylor (ICL) (1985) | 10 | 6 |
| Kitchenham and Taylor (BT System X) (1985) | 10 | 3 |
| Kitchenham and Taylor (BT Software Houses) (1985) | 12 | 6 |
| Li et al.(USP05) (2007)2 | 202 | 16 |
| Mišić and Tevsić (1998) | 6 | 16 |
| Maxwell (Dev Effort) (2002) | 63 | 32 |
| Maxwell (Maintenance Eff) (2002) | 67 | 28 |
| Miyazaki et al. (1994) | 47 | 9 |
| Moser et al. (1999) | 37 | 4 |
| Shepperd and Cartwright (Shepperd and Cartwright 2001) | 39 | 3 |
| Shepperd and Schofield (Telecom 1) (1997) | 18 | 5 |
| Schofield (real-time 1) Shepperd and Schofield (1997) | 21 | 4 |
| Schofield (Mermaid) (1998) | 30 | 18 |
| Schofield (Finnish) (1998) | 39 | 30 |
| Schofield (Hughes) (1998) | 33 | 14 |
| Woodfield et al. (1981) | 63 | 8 |
9.1 What Is Missing in Many Effort Datasets
Beyond the number of projects, modern effort-estimation studies should report:
- Context variables: development process, team size, domain, and technology stack.
- Temporal information: project start/end period and release cadence.
- Effort measurement protocol: person-hours vs person-days, inclusion/exclusion rules.
- Feature definitions: exact meaning and units of each predictor.
- Currency normalization (for cost): inflation year and conversion policy.
- Handling of outliers: whether extreme projects were removed or transformed.
- External validity notes: why results may or may not transfer to current agile/DevOps contexts.
Without this metadata, models are difficult to compare and often hard to reuse in practice.