Feature Selection
Filter Based
We specify some metric and based on that filter features.
chi-square test
fisher score
correlation coefficient
variance threshold
Wrapper-based
Wrapper methods consider the selection of a set of features as a search problem.
Sequential Feature Selection
Forward/Stepwise/Backward Selection
Embedded
Embedded methods use algorithms that have built-in feature selection methods.
Lasso
Tree based models
Forward Selection
The procedure starts with an empty set of features [reduced set]. The best of the original features is determined and added to the reduced set. At each subsequent iteration, the best of the remaining original attributes is added to the set.
Backward Elimination
The procedure starts with the full set of attributes. At each step, it removes the worst attribute remaining in the set.
Sequential Feature Selection
Greedy procedure where, at each iteration, we choose the best new feature to add to our selected features based on a cross-validation score.
That is, we start with 0 features and choose the best single feature with the highest score.
The procedure is repeated until we reach the desired number of selected features.
Embedded Feature Selection
Augment with noisy data
Can apply this approach to Tree based models: XGBoost, DecisionTree, RandomForrest
The idea is that we can inject noisy data features into our input training dataset when training model
Here we can perform cross-validation with n-folds where at each n-fold inject noisy features in each iteration and determine the threshold in which the first noisy feature is selected when computing feature importance.
We select the raw features that are less than this threshold and append to a list.
We do this iteratively for each fold and then take a set of the final appended list.
This approach can help minimize model overfitting.