Feature Selection Methods in Machine Learning

Feature Selection Methods in Machine Learning

Introduction

Feature selection is the process of reducing number of input features when developing a machine learning model. It is done because it reduces the computational cost of the model and to improve the performance of the model.

Features that have high correlation with output variable is selected for training the model. Selecting the subset of the input features is important because it can help building the most efficient model with those features that is most relevant to the target variable.

Model building with redundant features may mislead the model and may hamper the performance of the model. Hence, features selection is essential.

Categorization of the features selection

Features selection is subdivided into two parts namely:

  1. Supervised technique: It is the technique used for labelled data
  2. Unsupervised technique: It is the technique used for unlabelled data

For demonstration, I am using Jupyter Notebook and I will use heart disease prediction dataset from kaggle for implementation of various feature selection techniques. Here are some of the methods for feature selection:

feature-selection-methods-in-machine-learning-python

1. Filter method

Filter method computes the relation of individual features to the target variable based on the amount of correlation that the feature has with target variable. It is univariate analysis as it check how relevant the features with target variables individually. The types of filter method are as follows:

a) Information gain method

Information gain method computes the reduction on entropy. Information gain is based on the information theory that gives how much information a feature gives relation to that of another variable. Let’s see how information gain method is used for feature selection:

At first, I am going to load the dataset

import pandas as pd

df = pd.read_csv("heart.csv")
df.head()

Output

Now, let’s implement the information gain method as:

X = df.iloc[:, :-1]
y = df.iloc[:, -1]

from sklearn.feature_selection import mutual_info_classif

scores = mutual_info_classif(X, y)
print(scores)

Output

[0.00113135 0.         0.14604363 0.         0.09081394 0.01610141
 0.03330569 0.08534967 0.10247971 0.0602119  0.11768226 0.10865301
 0.16903598]

Let’s plot a bar chart for better visualization:

import matplotlib.pyplot as plt

features = df.columns[0:13]
new_df = pd.Series(importane, features)
new_df.plot(kind = 'barh')
plt.ylabel("Features")
plt.xlabel("scores")
plt.title("Features with scores")
plt.show()

Output

Visualizing this bar chart, we can select the number of features as per requirement. Feature ‘trtbps’ seems to have lowest score and features such as ‘sex’ and ‘age’ also can be dropped from dataset while training the model.

b) Chi-square method

Chi-square method is used for categorical data and calculates the chi-square between input features and target variable. Chi-squared distribution assumes the null hypothesis to be true. The formula used for calculation of chi-square is:

chi square formula

Now, let’s implement the chi-square method:

from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2

feature = SelectKBest(score_func = chi2, k = 'all')
best_features = feature.fit(X,y)
print(best_features.scores_)

Output

[ 23.28662399,   7.57683451,  62.59809791,  14.8239245 ,
  23.93639448,   0.20293368,   2.97827075, 188.32047169,
  38.91437697,  72.64425301,   9.8040952 ,  66.44076512,
  5.79185297]

Using these scores and features, let’s plot the bar chart for better understanding:

import matplotlib.pyplot as plt

features = df.columns[0:13]
new_df = pd.Series(best_features.scores_, features)
new_df.plot(kind = 'barh')
plt.ylabel("Features")
plt.xlabel("scores")
plt.title("Features with scores")
plt.show()

Output

feat selection

Visualizing this bar chart, we can select the top 10 or top 8 features. Also you can set k = 10(say) instead of k = ‘all’ for selecting top 10 features from dataset. Feature ‘thalachh’ has highest score and feature ‘fbs’ has lowest score.

c) Correlation coefficient method

In this method, the correlation coefficient of input feature is calculated with target variable. Correlation can be of positive and negative.

Positive correlation coefficient means if there is increase or decrease in the feature variable then there is corresponding increase or decrease in output variable. Negative correlation means that if there is increase in feature there is decrease in target variable and vice versa.

Correlation coefficient(r) has value ranging from -1 to 1.

If r = 1, high positive correlation,

If r = 0, no correlation,

If r = -1, highly negative correlation

Now, let’s see how correlation coefficient method is used for feature selection:

import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize = (13, 10))
sns.heatmap(df.corr(), annot = True)
plt.show()

Output

feat selection

We know that correlation coefficient of a variable to itself is 1. Looking above correlation matrix, it is found that features ‘cp’, ‘thalachh’, ‘slp’ are highly positively correlated to the output variable and features ‘thall’, ‘caa’, ‘lodpeak’, ‘exng’, ‘age’ and ‘sex’ have negative correlation with output variable. Other than these, above mentioned features don’t have that much correlation with output variable. Hence, we can drop these features from dataset.

2. Wrapper method

Wrapper method don’t use statistical method for feature selection. It takes a subset of features and apply them to train the model and calculates the accuracy. And it keeps this process on repeat until it came with the best features and best accuracy of the model. Since it involves the training of the model several time, it is very expensive and time consuming. This method is only suitable for small dataset only.

a) Recursive Feature Elimination

Recursive Feature Elimination(RFE) recursively removes the redundant features until the desire number of features are achieved and hence improving the performance and accuracy of the model.

from sklearn.feature_selection import RFE
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)
rfe = RFE(SVC(kernel = 'linear'), n_features_to_select = 8)
rfe.fit(X_train, y_train)
pred = rfe.predict(X_test)
print("Accuracy : ", accuracy_score(pred, y_test))

Output

Accuracy :  0.8524590163934426

This is how RFE is implemented to select the features and obtain the accuracy of the model.

b) Forward selection method

Forward selection method is a iterative process that starts with no feature in the model. In each iteration, it keeps adding the most relevant features to the target variable. It continues this task until the addition of new features don’t improve the model performance. We are going to use same dataset as taken in above feature selection methods. For this we need mlxtend module so

$ pip install mlxtend
from sklearn.neighbors import KNeighborsClassifier
from mlxtend.feature_selection import SequentialFeatureSelector

ffs = SequentialFeatureSelector(KNeighborsClassifier(n_neighbors = 4), 
                                k_features = 10, 
                                forward = True, 
                                n_jobs = -1)
fs  = ffs.fit(X, y)
print(fs.k_feature_names_)
fs.k_score_

Output

('age', 'sex', 'cp', 'fbs', 'restecg', 'exng', 'oldpeak', 'slp', 'caa', 'thall')
0.7625136612021859

These are the top 10 features that are most relevant to the output variable. We can select any number of features by specifying value of k_features.

c) Backward elimination method

Backward elimination method is just reverse process of the forward selection method. Initially it trains model with all the features in it and iteration by iteration it reduces the number of features ensuring the selection of best parameters to the model and hence increasing the accuracy of the model.

from sklearn.neighbors import KNeighborsClassifier
from mlxtend.feature_selection import SequentialFeatureSelector

ffs = SequentialFeatureSelector(KNeighborsClassifier(n_neighbors = 4), 
                                k_features = 8, 
                                forward = False, 
                                n_jobs = -1)
fs  = ffs.fit(X, y)
print(fs.k_feature_names_)
print(fs.k_score_)

Output

('sex', 'cp', 'fbs', 'restecg', 'exng', 'slp', 'caa', 'thall')
0.8513114754098361

These are the top 8 features that are relevant to the target variable.

3) Embedded method

Embedded method performs feature selection while creating the machine learning model.

LASSO regularization

In this method some of the coefficient is shrink to zero, indicating certain features are multiplied by zero to estimate the target. So, these features can be removed because they do not contribute to the performance of the model.

from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import SelectFromModel

sfm = SelectFromModel(LogisticRegression(C = 1, penalty = 'l2'))
sfm.fit(X_train, y_train)
important_features = X_train.columns[(sfm.get_support())] 
print(important_features)

Output

Index(['sex', 'cp', 'restecg', 'exng', 'oldpeak', 'caa', 'thall'], dtype='object')

Conclusion

Feature selection is important for filtering the redundant features from the dataset. Presence of the redundant features can mislead the model which can cause degradation in model performance.

Filter method of model selection use staistical approach to select the features while wrapper method don’t use statistical approach for feature selection. This method is only suitable for small number of datasets and can be very complex in term of computation with large datasets.

Embedded method select the features in time of model building thus has name embedded. Hence, feature selection is important because all the features are not relevant to the output variable and selecting only subset of the features available improves the performance of the model.

Reference

Hands-on with Feature Selection Techniques

Happy Learning 🙂

This Post Has 2 Comments

  1. rouizi

    This blog is just amazing. Thank you for your help

    1. Rajesh Raskoti

      Good to hear that. Thank you for appreciation 🤗

Leave a Reply