Feature Selection


Filter Based

  • We specify some metric and based on that filter features.

    • chi-square test

    • fisher score

    • correlation coefficient

    • variance threshold

Wrapper-based

  • Wrapper methods consider the selection of a set of features as a search problem.

    • Sequential Feature Selection

    • Forward/Stepwise/Backward Selection

Embedded

  • Embedded methods use algorithms that have built-in feature selection methods.

    • Lasso

    • Tree based models

Forward Selection

  • The procedure starts with an empty set of features [reduced set]. The best of the original features is determined and added to the reduced set. At each subsequent iteration, the best of the remaining original attributes is added to the set.

Backward Elimination

  • The procedure starts with the full set of attributes. At each step, it removes the worst attribute remaining in the set.

Sequential Feature Selection

  • Greedy procedure where, at each iteration, we choose the best new feature to add to our selected features based on a cross-validation score.

  • That is, we start with 0 features and choose the best single feature with the highest score.

  • The procedure is repeated until we reach the desired number of selected features.

Embedded Feature Selection

  • Augment with noisy data

  • Can apply this approach to Tree based models: XGBoost, DecisionTree, RandomForrest

  • The idea is that we can inject noisy data features into our input training dataset when training model

  • Here we can perform cross-validation with n-folds where at each n-fold inject noisy features in each iteration and determine the threshold in which the first noisy feature is selected when computing feature importance.

  • We select the raw features that are less than this threshold and append to a list.

  • We do this iteratively for each fold and then take a set of the final appended list.

  • This approach can help minimize model overfitting.