Introduction and implementation to keras-tuner

Choosing the right set of hyperparameters for your models

Shivani Shimpi
Edvora Inc

--

When it comes to deep learning, a critical thing to work with is the hyperparameters. To those of you who don’t know, hyperparameters are the variables that govern the structure of your neural network. It could be — but are not limited to — the number of layers, the number of neurons, the learning rate, or the number of epochs. Whenever you create a model in deep learning, the initial model is mostly not perfect unless you hit a home run, and then you optimize the model by tweaking the hyperparameters before you pick another one.

Photo by the Author

This tutorial is not going to help you to build TensorFlow models, instead will be a guide to using keras-tuner on your existing models.

Let’s take a simple regression problem statement, and construct a very basic model with three fully-connected / dense layers, where you have 16, 32, and 1 unit(s) in the first, second, and the last layer respectively with relu activations. Let’s keep the learning rate 0.001 for the Adam optimizer, and run it on 50 epochs. Below is the code for the same if you want to follow along.

Now let’s say your model didn’t perform very well, in fact, it was terrible, and you want to try different hyperparameters. Instead of having to manually tweak everything, we would just automate the process and leave everything up to keras-tuner to give us the optimal combination.

Before we begin, you need to ensure that you’re working with TensorFlow v^2.0.0 since it has keras inbuilt, otherwise, if you’re working on lower versions you might have to manually install it. Once you have that, you can install keras-tuner.

$ pip install -U keras-tuner

We will be using the RandomSearch class for selecting the best set of layers by sampling different combinations picked at random.

When you read the documentation you might come across a term called the hypermodel, a hypermodel is just a model that is built for tuning the hyperparameters. Let’s see how to build that.

We will be creating a LOG_DIR to create a log of all the models that have been tried. In hypermodel.py you may notice that we use hp.Int in two types in the code and hp.Choice once, let’s understand them bit by bit.

  • hp.Int: This function selects values randomly between the range you provide it. If you observe the following code,
model.add(Dense(    
hp.Int('input_units', min_value=12, max_value=256),
input_shape=trainFeatures.shape[1:])
)

you will notice that the first parameter is a string, in our case input_units, that you can modify to display the desired string for the corresponding parameter, the min_value and max_value parameters set the range for the number of neurons/units in the Dense layer.
On similar lines, the next usage of hp.Int is as follows

for i in range(hp.Int("n_layers", min_value=1, max_value=5)):
model.add(Dense(hp.Int(f"dense_{i}_units", 32, 256)))
model.add(Activation('relu'))

This means that you loop over the Dense and Activation layers to decide how many numbers of those layers you want, instead of input_units here we have n_layers as the string and the min_value and max_value are replaced by 1 and 5 meaning there could be any number of layers between 1 to 5.

  • hp.Choice: This function selects values specified in the list you provide it. If you observe the following code,
optimizer=tf.keras.optimizers.Adam(
hp.Choice('lr',[0.001,0.01,0.002,0.2])
)

you will notice that the first parameter is again a string lr but this time we have a list of values [0.001,0.01,0.002,0.02] this means that the hypermodel will only pick values specified in this list.

Instead of hp.Choice, you could also use hp.Float for setting the learning rate, but I would let you explore that more.

Once we have built the hypermodel, we now have to instantiate it. There are four ways to do that RandomSearch, Hyperband, BayesianOptimization, and Sklearn. We will be using RandomSearch to instantiate our tuner. You could do that as follows.

Let’s understand a little bit more about the trials parameter that we have specified while instantiating the tuner.

  • maximum_trials: These are the number of random combinations that you want to try out, or more clearly, the number of models you want to create.
  • executions_per_trial: This builds the same model n number of times. You might want to set it to 3 just to average and ensure that you didn’t just hit a home run and that your model is actually performing well.
  • objective: This is basically the metric that you want to optimize, more clearly, the objective that you want to achieve out of the whole process.

Okay now that we have instantiated our tuner, we need to search for the best model, and for that, we need to create a tuner search object. This essentially requires the same parameters as model.fit.

This is how your output would look like once you run the tuner-search code block. It will display the best objective value along with the total time elapsed.

Photo by the Author

Now in case you want to reuse the models or analyze the best ones later, you can save and load the pickle objects as follows.

import pickle#Saves the tuner as pickle obje
with open(f'tuner_{int(time.time())}.pkl', 'wb') as f:
pickle.dump(tuner, f)
#Loads the former object
tuner = pickle.load(open('tuner_1610218332.pkl','rb'))

Going back to our tuner, we’ll try to get some more information from that.
The tuner.results_summary() will give you the top ten models, whereas tuner.get_best_models()[0].summary() will give you the summary of the best model that would look exactly like the summary of a TensorFlow model.

Photo by the Author

References:

[1] https://keras-team.github.io/keras-tuner/

That’s it from my side. Thanks for reading! Hope you learned something new.

--

--

Shivani Shimpi
Edvora Inc

Working on a breakthrough idea? Let’s innovate! Reach out: shivani@edvora.com | AI | Edvora | Distributed Ledger Full-stack Development | DeepTech