A Machine Learning model is defined as a mathematical model with a number of parameters that need to be learned from the data. However, there are some parameters, known as Hyperparameters and those cannot be directly learned. They are commonly chosen by humans based on some intuition or hit and trial before the actual training begins. These parameters exhibit their importance by improving the performance of the model such as its complexity or its learning rate. Models can have many hyper-parameters and finding the best combination of parameters can be treated as a search problem.
In this article, we will use Hyperparameter Tuning to fine-tune a machine learning algorithm to find its optimal parameters using GridSearch Cross Validation technique.
First of all, we will examine a built-in dataset in Scikit Learn library and apply a Support Vector Classification algorithm to learn about this data.
Load datasets into Pandas dataframe and print the head of data.
Split the train/test the dataset to prepare for applying machine learning algorithm.
Perform training on the train dataset and then predict the outcome of test dataset using SVC with default parameters.
We can see that with default parameters the SVC can predict the dataset with 61% accuracy. Notice that recall and precision for class 0 are always 0. It means that classifier is always classifying everything into a single class i.e class 1! This means our model needs to have its parameters tuned.
Here is when the usefulness of GridSearch comes into the picture. We can search for parameters using a GridSearchCV.
What GridSearchCV does is a bit more involved than usual. First, it runs the same loop with cross-validation, to find the best parameter combination. Once it has the best combination, it runs fit again on all data passed to fit (without cross-validation), to build a single new model using the best parameter setting.
We can inspect the best parameters found by GridSearchCV in the best_params_ attribute and the best estimator in the best_estimator_ attribute.
After we found the best-tuned parameters for the model, we want to re-predict to see how much improvement the optimal model performs.
Wow, after applying Hyperparameter Tuning using GridSearchCV we found the optimal model and improve the algorithm accuracy from 61% to impressive 95%.
Hyperparameter Tuning is always a good idea. However, if we perform too much optimization we can overfit the model, So, we need to use it appropriately.
Happy tuning.
Comments