Chapter 8 Feature Selection Example
Feature Selection in R and Caret
library(caret)
library(doParallel) # parallel processing## Loading required package: foreach
## Loading required package: iterators
## Loading required package: parallel
library(dplyr) # Used by caret
library(pROC) # plot the ROC curve## Type 'citation("pROC")' for a citation.
##
## Attaching package: 'pROC'
## The following objects are masked from 'package:stats':
##
## cov, smooth, var
library(foreign)
### Use the segmentationData from caret
# Load the data and construct indices to divided it into training and test data sets.
#set.seed(10)
kc1 <- read.arff("./datasets/defectPred/D1/KC1.arff") inTrain <- createDataPartition(y = kc1$Defective,
## the outcome data are needed
p = .75,
## The percentage of data in the
## training set
list = FALSE)The function createDataPartition does a stratified partitions.
training <- kc1[inTrain,]
nrow(training)## [1] 1573
testing <- kc1[-inTrain, ]
nrow(testing)## [1] 523
The train function can be used to + evaluate, using resampling, the effect of model tuning parameters on performance + choose the “optimal” model across these parameters + estimate model performance from a training set
fitControl <- trainControl(## 10-fold CV
method = "repeatedcv",
number = 10,
## repeated ten times
repeats = 10)gbmFit1 <- train(Defective ~ ., data = training, method = “gbm”, trControl = fitControl, ## This last option is actually one ## for gbm() that passes through verbose = FALSE) gbmFit1
plsFit <- train(Defective ~ .,
data = training,
method = "pls",
## Center and scale the predictors for the training
## set and all future samples.
preProc = c("center", "scale")
)# To fix
# testPred <- predict(plsFit, testing)
# postResample(testPred, testing$Defective)
# sensitivity(testPred, testing$Defective)
# confusionMatrix(testPred, testing$Defective)When there are three or more classes, confusion matrix will show the confusion matrix and a set of “one-versus-all” results.