Chapter 7 Feature Selection

This technique consists in selecting the most relevant attributes. The need of applying FS includes the following points:

  • A reduced volume of data allows different data mining or searching techniques to be applied.

  • Irrelevant and redundant attributes can generate less accurate and more complex models. Furthermore, data mining algorithms can be executed faster.

  • It is possible to avoid the collection of data for those irrelevant and redundant attributes in the future.

FS algorithms designed with different evaluation criteria broadly fall into two categories:

  • The filter model relies on general characteristics of the data to evaluate and select feature subsets without involving any data mining algorithm.

  • The wrapper model requires one predetermined mining algorithm and uses its performance as the evaluation criterion. It searches for features better suited to the mining algorithm aiming to improve mining performance, but it also tends to be more computationally expensive than filter model [11, 12].

7.1 Instance Selection

TBD

7.2 Missing Data Imputation

TBD